On May 1, 2006, at 10:21 AM, Grant Ingersoll wrote:
You might be interested in the Carrot project, which has some
Lucene support. I don't know if it solves your second problem, but
it already implements clustering and may allow you to get to an
answer for the second problem quicker. I have, just recently,
started using it for a clustering task I am working on related to
search results.
I tracked down this demo...
http://www.cs.put.poznan.pl/dweiss/tmp/carrot2-lucene.zip
From what I can tell, it doesn't use Lucene's term vectors. I think
it should be possible to exploit those Term Vectors, perhaps yielding
a good result without having to build a summary for each document.
Dunno if the benefits justify the development effort. :) I have to
implement host-deduping in a KinoSearch-based app anyway, though, so
I think I'll try this technique and see how well things work if I
extend it for use with non-keyword fields.
Marvin Humphrey
Rectangular Research
http://www.rectangular.com/
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]