Ok, updated!

On Fri, Mar 29, 2013 at 7:36 PM, Andy Twigg <[email protected]> wrote:

> Dan,
>
> I think what you've written is fine (I wanted to edit to remove the
> '?' around random forests but couldn't).
>
> ok?
>
>
>
> On 29 March 2013 11:14, Dan Filimon <[email protected]> wrote:
> > I added Andy's first suggestion and Ted's suggestion as ideas.
> >
> > Andy, could you flesh out your second suggestion into a project and make
> an
> > issue please?
> >
> >
> > On Fri, Mar 29, 2013 at 3:53 AM, Ted Dunning <[email protected]>
> wrote:
> >
> >> It should be possible to view a Lucene index as a matrix.  This would
> >> require that we standardize on a way to convert documents to rows.
>  There
> >> are many choices, the discussion of which should be deferred to the
> actual
> >> work on the project, but there are a few obvious constraints:
> >>
> >> a) it should be possible to get the same result as dumping the term
> vectors
> >> for each document each to a line and converting that result using
> standard
> >> Mahout methods.
> >>
> >> b) numeric fields ought to work somehow.
> >>
> >> c) if there are multiple text fields that ought to work sensibly as
> well.
> >>  Two options include dumping multiple matrices or to convert the fields
> >> into a single row of a single matrix.
> >>
> >> d) it should be possible to refer back from a row of the matrix to find
> the
> >> correct document.  THis might be because we remember the Lucene doc
> number
> >> or because a field is named as holding a unique id.
> >>
> >> e) named vectors and matrices should be used if plausible.
> >>
> >> On Thu, Mar 28, 2013 at 4:58 PM, Dan Filimon <
> [email protected]
> >> >wrote:
> >>
> >> > ...
> >> > Ted, could you explain a bit more what you mean by "simplify the
> >> connection
> >> > to Lucene for clustering and classification"? It's too vague for an
> idea
> >> > proposal.
> >> >
> >>
>
>
>
> --
> Dr Andy Twigg
> Junior Research Fellow, St Johns College, Oxford
> Room 351, Department of Computer Science
> http://www.cs.ox.ac.uk/people/andy.twigg/
> [email protected] | +447799647538
>

Reply via email to