On May 1, 2006, at 10:21 AM, Grant Ingersoll wrote:

You might be interested in the Carrot project, which has some Lucene support. I don't know if it solves your second problem, but it already implements clustering and may allow you to get to an answer for the second problem quicker. I have, just recently, started using it for a clustering task I am working on related to search results.

I tracked down this demo...

http://www.cs.put.poznan.pl/dweiss/tmp/carrot2-lucene.zip

From what I can tell, it doesn't use Lucene's term vectors. I think it should be possible to exploit those Term Vectors, perhaps yielding a good result without having to build a summary for each document. Dunno if the benefits justify the development effort. :) I have to implement host-deduping in a KinoSearch-based app anyway, though, so I think I'll try this technique and see how well things work if I extend it for use with non-keyword fields.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to