Wow, waiting this for a long time, finally fixed.

On Sun 02 Mar 2014 05:01:26 PM EST, Suneel Marthi (JIRA) wrote:

      [ 
https://issues.apache.org/jira/browse/MAHOUT-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi updated MAHOUT-1178:
----------------------------------

     Fix Version/s:     (was: Backlog)
                    1.0

GSOC 2013: Improve Lucene support in Mahout
-------------------------------------------

                 Key: MAHOUT-1178
                 URL: https://issues.apache.org/jira/browse/MAHOUT-1178
             Project: Mahout
          Issue Type: New Feature
            Reporter: Dan Filimon
            Assignee: Gokhan Capan
              Labels: gsoc2013, mentor
             Fix For: 1.0

         Attachments: MAHOUT-1178-TEST.patch, MAHOUT-1178.patch


[via Ted Dunning]
It should be possible to view a Lucene index as a matrix.  This would
require that we standardize on a way to convert documents to rows.  There
are many choices, the discussion of which should be deferred to the actual
work on the project, but there are a few obvious constraints:
a) it should be possible to get the same result as dumping the term vectors
for each document each to a line and converting that result using standard
Mahout methods.
b) numeric fields ought to work somehow.
c) if there are multiple text fields that ought to work sensibly as well.
  Two options include dumping multiple matrices or to convert the fields
into a single row of a single matrix.
d) it should be possible to refer back from a row of the matrix to find the
correct document.  THis might be because we remember the Lucene doc number
or because a field is named as holding a unique id.
e) named vectors and matrices should be used if plausible.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to