Shashi, I think I am planning or intended to do the same thing as implemented in LSI methodology. It seems from your meesage, you or somebody might have used the LSI approach in lucene. So can you share some of your work. I am more interested to know any library or package or paper used for analyzing terms semantically and constrcuting vector space.
- RB ----- Original Message ---- From: Shashi Kant <shashi....@gmail.com> To: java-user@lucene.apache.org Sent: Tuesday, June 23, 2009 3:20:16 PM Subject: Re: Similarity I suspect what you are looking for is "Latent Semantics" - it can algorithmically infer that "iPod~iPhone" or "Apple~Steve Jobs". Google for "Latent Semantic Indexing" or "Latent Semantic Analysis" - you can apply some of those approaches using the TermVectors in Lucene index. Ontologies such as WordNet are very generic, hence if you have a domain specific corpus, you would need to generate some kind of Latent Semantic Index to extract the relations therein. On Tue, Jun 23, 2009 at 5:27 AM, Cool The Breezer <techcool.ku...@yahoo.com>wrote: > > Of the late I started using Lucene as main search library for all documents > in our intranet. It works extremely well. I am trying to use similarity > kinda functionality to find similarity between two sentences/documents and > trying to use Wordnet in our searching solution. I have used wordnet contrib > package and it really works well to expand queries with synonyms and get > results. But I can get handicap when searching for documents with query like > "Steve Jobs" and documents containing "apple" should be returned. In the > same way "pirated" and "willfull downloading copyrighted material". This > comes finding meaning of a word wrt its context. Has anybody done any kind > of such context based indexing that means while tokenization based on > context of each word/token and searching the same after expanding the query > using synonyms. I have come across some sf projects like > http://wn-similarity.sourceforge.net/ to semantically relating words > using wordnet but I am > still kinda confused on how to move ahead with such kind of context based > search. Appreciate your help. I understand that this might not be directly > related to Lucene but somehow this falls in the same domain search solution. > > - RB > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org