Thanks a lot! I was reading about Mahout today. I'll try that out. Thanks again Maria
Sent from my iPhone On Oct 27, 2010, at 20:59, Lance Norskog <goks...@gmail.com> wrote: > There are tools for this in the Mahout project. These are oriented > toward large-scale work. > > http://mahout.apache.org > > There is a big learning curve and you have to learn Hadoop somewhat. > > The book 'Collective Intelligence' includes a suite of Python tools > for small-scale experiments. > > On Wed, Oct 27, 2010 at 1:12 PM, Maria Vazquez <mvazq...@ova.st> wrote: >> I need to auto-categorize a large number of documents. They are basically >> news articles from major news sources (nytimes, npr, abcnews, etc). >> I'd like to categorize them automatically. Any suggestions? >> Lucene in Action suggests using a set of documents to build category vectors >> and then comparing each document to each of those vectors and get the >> closest one. >> The approach seems pretty simple (from other papers I read on text >> categorization) but maybe you guys know of something out there that already >> does this using Lucene/Solr. >> Thanks! >> Maria >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> > > > > -- > Lance Norskog > goks...@gmail.com > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org