29 jan 2007 kl. 16.46 skrev Christoph Pächter:
Is there any work/project to include LSI in Lucene?
There are some questions in the mailing lists, but they are older
than a year.
Something happened since then?
As far as I know, no. But there are many other projects that does it
for you. Carrot search use a number of algorithms (some source is
open, some is proprietary) that cluster things up live based on td-
idf calculated from the results. Weka also have some algorithms that
can be used. They are however not very optimized for text mining. I
have seen some references to sparse matrix implementations, but I
don't think it is an official part of the distribution.
I know nothing about how you plan to use it, but looking from the
perspective of the applications I use Lucene for, it is not that
often a corpus contains all the data people are searching for, or as
the data is created by the users that don't know the correct terms to
describe the information, there are associations between documents
that are not detectable by analyzing terms. There for I think that
finding associations in the content is not as interesting as finding
associations by analyzing session behaviour. I would use LSI as
something secondary on top of analyzed behaviour. Know what I mean?
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]