29 jan 2007 kl. 16.46 skrev Christoph Pächter:

Is there any work/project to include LSI in Lucene?
There are some questions in the mailing lists, but they are older than a year.
Something happened since then?

As far as I know, no. But there are many other projects that does it for you. Carrot search use a number of algorithms (some source is open, some is proprietary) that cluster things up live based on td- idf calculated from the results. Weka also have some algorithms that can be used. They are however not very optimized for text mining. I have seen some references to sparse matrix implementations, but I don't think it is an official part of the distribution.

I know nothing about how you plan to use it, but looking from the perspective of the applications I use Lucene for, it is not that often a corpus contains all the data people are searching for, or as the data is created by the users that don't know the correct terms to describe the information, there are associations between documents that are not detectable by analyzing terms. There for I think that finding associations in the content is not as interesting as finding associations by analyzing session behaviour. I would use LSI as something secondary on top of analyzed behaviour. Know what I mean?



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to