Any news on this front? Did we get approved/assigned a slot/anything?
On Fri, Mar 29, 2013 at 7:44 PM, Dan Filimon <[email protected]>wrote: > Ok, updated! > > > On Fri, Mar 29, 2013 at 7:36 PM, Andy Twigg <[email protected]> wrote: > >> Dan, >> >> I think what you've written is fine (I wanted to edit to remove the >> '?' around random forests but couldn't). >> >> ok? >> >> >> >> On 29 March 2013 11:14, Dan Filimon <[email protected]> wrote: >> > I added Andy's first suggestion and Ted's suggestion as ideas. >> > >> > Andy, could you flesh out your second suggestion into a project and >> make an >> > issue please? >> > >> > >> > On Fri, Mar 29, 2013 at 3:53 AM, Ted Dunning <[email protected]> >> wrote: >> > >> >> It should be possible to view a Lucene index as a matrix. This would >> >> require that we standardize on a way to convert documents to rows. >> There >> >> are many choices, the discussion of which should be deferred to the >> actual >> >> work on the project, but there are a few obvious constraints: >> >> >> >> a) it should be possible to get the same result as dumping the term >> vectors >> >> for each document each to a line and converting that result using >> standard >> >> Mahout methods. >> >> >> >> b) numeric fields ought to work somehow. >> >> >> >> c) if there are multiple text fields that ought to work sensibly as >> well. >> >> Two options include dumping multiple matrices or to convert the fields >> >> into a single row of a single matrix. >> >> >> >> d) it should be possible to refer back from a row of the matrix to >> find the >> >> correct document. THis might be because we remember the Lucene doc >> number >> >> or because a field is named as holding a unique id. >> >> >> >> e) named vectors and matrices should be used if plausible. >> >> >> >> On Thu, Mar 28, 2013 at 4:58 PM, Dan Filimon < >> [email protected] >> >> >wrote: >> >> >> >> > ... >> >> > Ted, could you explain a bit more what you mean by "simplify the >> >> connection >> >> > to Lucene for clustering and classification"? It's too vague for an >> idea >> >> > proposal. >> >> > >> >> >> >> >> >> -- >> Dr Andy Twigg >> Junior Research Fellow, St Johns College, Oxford >> Room 351, Department of Computer Science >> http://www.cs.ox.ac.uk/people/andy.twigg/ >> [email protected] | +447799647538 >> > >
