Hi Fredrik, I have asked question before, Erik Hatcher has give me the link below
http://www.lucenebook.com/blog/errata/scoring_formula_omission.html It shows a formula which was not completely implemented. Regards Madhu -----Original Message----- From: Fredrik Andersson [mailto:[EMAIL PROTECTED] Sent: Monday, September 05, 2005 1:35 PM To: general@lucene.apache.org Subject: Re: VSM in Lucene, again Hi Otis, Yes, I have looked through that class thoroughly, but all I see is an IDF-map lookup with boost functionality. The only thing allowing a query to return a document that is not containing the terms in the query is by the sloppyFreq function. It's more of a semantic trick based on edit distance, so it has nothing to do with the vector angles in a regular vector space model. The document terms still have to be semantically similar to the ones in the query, which is not the case when matching by vector angles in a VSM (though you often boost documents containing words from the query, naturally). Fredrik On 9/5/05, Otis Gospodnetic <[EMAIL PROTECTED]> wrote: > > Hi Fredrik, > > Are you looking for org.apache.lucene.search.DefaultSimilarity ? > > Otis > > --- Fredrik Andersson <[EMAIL PROTECTED]> wrote: > > > Hi folks. > > > > I read a transcript from last months digest of this list, in a post > > by > > Rajesh Munavalli, that Lucene uses a VSM retrieval method. In my > > previous > > work with VSM, it has included matching a query vector towards the > > documents > > in the term-document space. I have dissected and customized a lot of > > classes > > in the Lucene indexing and searching classes, but I have yet to > > discover > > where the actual dot product of the query vector and the document > > vectors is > > performed, if Lucene uses this method for information retrieval. > > Using this > > method involves a certain angle which you consider as "close", which > > is a > > parameter that Lucene would benefit from exposing in its API. This I > > have > > not seen any trails of, either. To keep a long story short, a lot of > > the > > stuff that I usually associate with VSM and LSI information retrieval > > is > > missing or cleverly hidden. > > > > If someone could shed some light on this issue, I would be very > > thankful. > > It's probably just that we have different notions of the VSM model, > > but I'd > > like to get this straightened out. > > > > Greetings, > > Fredrik > > > >