On Wed, Feb 17, 2010 at 7:10 PM, Jason Surratt <[email protected]>wrote:
> I've spent a bit of time looking over Drew's Avro stuff as well as > http://issues.apache.org/jira/browse/MAHOUT-262, SVM and the SGD > implementations. Is it the intent for classifiers to use > SingleLabelVectorWritable as the input value during the map step at some > point in the future? If so, I'm happy to write up some code around Naive > Bayes and an input format to do just that -- maybe it'll be useful to > someone else. > We definitely want to have a common input format for all algorithms (where it makes sense). The two candidates are honest to goodness sparse or dense vectors versus something like a document. Since it saves a huge amount of effort to integrate the conversion from document to vector directly into the algorithm it is looking like all algorithms will need to support both. Doing that without lots of effort in each algorithm is the trick that Robin and Drew are working on just now. Your contributions would be invaluable (you are a real live user!) > There is a lot of code and JIRAs to take in so I apologize if I'm missing > something. > No problem. It is an exciting project that way. -- Ted Dunning, CTO DeepDyve
