Very interesting. Actually, reading Meet Lucene Part 2 by Otis Gospondnetic and Eric Hatcher there is mention of Egothor and MG4J. Egothor. What is not mentioned is Carrot" which, I think, originally has been used with Egothor but has also been submitted to Lucene CVS. There is a helpful discussion with links that can be found in lucene-user Re: Term Weights and Clustering from Feb 2005. MG4J also looks very interesting. There are good resources about it on sebastiano Vigna's web site, vigna.dsi.unimi.it <http://vigna.dsi.unimi.it>. The mention the use of Bloom filters, an aticle about which has been written by Maciej Ceglowski who is (largely) responsible for the LSI/ContextGraph implementation at NITLE. Article Using Bloom Filters on Perl.com<http://Perl.com>. Bloom filters look a bit like the random index from Sahlgren I have mentioned. Much to do! Adam
On 10/7/05, Lorenzo Viscanti <[EMAIL PROTECTED]> wrote: > > I use my own LSI implementation based on Lucene for text clustering. > I've done some tests, but I do believe that integrating LSI onto the > lucene > search subsystem (i.e. creating something like LSISimilarity) is not an > easy > task > > I start analyzing the documents using Lucene, and then extract tfidf > values > (with lucene again), in order to build a documents/terms matrix. Then I > use > an implementation of LSI/SVD to analyze it. > At this point I think that reassigning the scores back to Lucene documents > is very difficult; but I'm trying to grab the modified scores from the > matrix on my LSISImilarity. > Instead clustering search results this way is not too difficult, I just > apply the algorithm (mostly HAC-like) to the modified matrix. > To search using LSI you must choose a small subset of the collection and > then apply LSI/SVD to it, then extend the matrix by 'folding in' new > documents. But how to choose the initial subset? Maybe just searching the > index and then using the first n documents retrieved. > Any idea? > Lorenzo > > On 10/7/05, Paul Libbrecht <[EMAIL PROTECTED]> wrote: > > > > > > I've met other persons with such needs and we would also be interested. > > > > Unfortunately, this seems not to be available. > > A clear issue might be that LSI, in its original form at least, is > > covered by an US patent. But maybe someone finds another form which is > > not. > > > > paul > > > > > > Le 5 oct. 05, à 14:59, <[EMAIL PROTECTED]> a écrit : > > > I am looking for LSI implementation i lucene. Is it available. I > > > couldnt find it in the website. I searched in the archives but no > > > help. could some one tell me if it is available or not. > > > > > > Could you tell me where can i see to find if there are any Language > > > processing tools for Indexing and retrieval stuff available > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > >