Nothing's fixed yet, trying to revive the discussion around this to see if it 
leads to anything tangible it if this needs to be resolved as 'Won't fix' .


Sent from my iPhone

> On Mar 2, 2014, at 5:18 PM, peng <[email protected]> wrote:
> 
> Wow, waiting this for a long time, finally fixed.
> 
>> On Sun 02 Mar 2014 05:01:26 PM EST, Suneel Marthi (JIRA) wrote:
>> 
>>      [ 
>> https://issues.apache.org/jira/browse/MAHOUT-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
>>  ]
>> 
>> Suneel Marthi updated MAHOUT-1178:
>> ----------------------------------
>> 
>>     Fix Version/s:     (was: Backlog)
>>                    1.0
>> 
>>> GSOC 2013: Improve Lucene support in Mahout
>>> -------------------------------------------
>>> 
>>>                 Key: MAHOUT-1178
>>>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1178
>>>             Project: Mahout
>>>          Issue Type: New Feature
>>>            Reporter: Dan Filimon
>>>            Assignee: Gokhan Capan
>>>              Labels: gsoc2013, mentor
>>>             Fix For: 1.0
>>> 
>>>         Attachments: MAHOUT-1178-TEST.patch, MAHOUT-1178.patch
>>> 
>>> 
>>> [via Ted Dunning]
>>> It should be possible to view a Lucene index as a matrix.  This would
>>> require that we standardize on a way to convert documents to rows.  There
>>> are many choices, the discussion of which should be deferred to the actual
>>> work on the project, but there are a few obvious constraints:
>>> a) it should be possible to get the same result as dumping the term vectors
>>> for each document each to a line and converting that result using standard
>>> Mahout methods.
>>> b) numeric fields ought to work somehow.
>>> c) if there are multiple text fields that ought to work sensibly as well.
>>>  Two options include dumping multiple matrices or to convert the fields
>>> into a single row of a single matrix.
>>> d) it should be possible to refer back from a row of the matrix to find the
>>> correct document.  THis might be because we remember the Lucene doc number
>>> or because a field is named as holding a unique id.
>>> e) named vectors and matrices should be used if plausible.
>> 
>> 
>> 
>> --
>> This message was sent by Atlassian JIRA
>> (v6.2#6252)

Reply via email to