I added Andy's first suggestion and Ted's suggestion as ideas.

Andy, could you flesh out your second suggestion into a project and make an
issue please?


On Fri, Mar 29, 2013 at 3:53 AM, Ted Dunning <[email protected]> wrote:

> It should be possible to view a Lucene index as a matrix.  This would
> require that we standardize on a way to convert documents to rows.  There
> are many choices, the discussion of which should be deferred to the actual
> work on the project, but there are a few obvious constraints:
>
> a) it should be possible to get the same result as dumping the term vectors
> for each document each to a line and converting that result using standard
> Mahout methods.
>
> b) numeric fields ought to work somehow.
>
> c) if there are multiple text fields that ought to work sensibly as well.
>  Two options include dumping multiple matrices or to convert the fields
> into a single row of a single matrix.
>
> d) it should be possible to refer back from a row of the matrix to find the
> correct document.  THis might be because we remember the Lucene doc number
> or because a field is named as holding a unique id.
>
> e) named vectors and matrices should be used if plausible.
>
> On Thu, Mar 28, 2013 at 4:58 PM, Dan Filimon <[email protected]
> >wrote:
>
> > ...
> > Ted, could you explain a bit more what you mean by "simplify the
> connection
> > to Lucene for clustering and classification"? It's too vague for an idea
> > proposal.
> >
>

Reply via email to