Re: LSI, Latent Semantic Indexing

karl wettin Mon, 29 Jan 2007 08:30:22 -0800


29 jan 2007 kl. 16.46 skrev Christoph Pächter:

Is there any work/project to include LSI in Lucene?
There are some questions in the mailing lists, but they are olderthan a year.
Something happened since then?

As far as I know, no. But there are many other projects that does itfor you. Carrot search use a number of algorithms (some source isopen, some is proprietary) that cluster things up live based on td-idf calculated from the results. Weka also have some algorithms thatcan be used. They are however not very optimized for text mining. Ihave seen some references to sparse matrix implementations, but Idon't think it is an official part of the distribution.

I know nothing about how you plan to use it, but looking from theperspective of the applications I use Lucene for, it is not thatoften a corpus contains all the data people are searching for, or asthe data is created by the users that don't know the correct terms todescribe the information, there are associations between documentsthat are not detectable by analyzing terms. There for I think thatfinding associations in the content is not as interesting as findingassociations by analyzing session behaviour. I would use LSI assomething secondary on top of analyzed behaviour. Know what I mean?




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: LSI, Latent Semantic Indexing

Reply via email to