The good news is that is very small volume. Lucene and Mahout operate, broadly, in the realm of tens of millions of things or more. At this scale I think performance will not be an issue no matter what you choose, so choose based on your other requirements.
On Aug 22, 2009 9:18 PM, "Tim Hughes" <[email protected]> wrote: We are looking to do a query of documents & abstracts from a legacy system, then retrieve the docs for clustering & classification via Mahout. Expected volume is something on the order of 2,000 - 3,000 documents. Ted Dunning wrote: > > Can you say more about your application? > > Mahout is a very young proj... -- View this message in context: http://www.nabble.com/Custom-Algorithm-%28C-C%2B%2B%29---tp25096676p25097395.html Sent from the Mahout User List mailing list archive at Nabble.com.
