Go for a HitCollector. In particular, TopDocs will give you the raw scores.
Erick On 4/11/07, Michael Barbarelli <[EMAIL PROTECTED]> wrote:
Hi Grant. Yes, I'm getting the score from the Hits collection. And yes, they get normalized to 1; which is what I don't want. Or, I can leave the Hits objects as is, but I know Lucene also must calculate a raw difference as part of the overall score calculation. How can I get at that value? Thanks! Mike On 4/11/07, Grant Ingersoll <[EMAIL PROTECTED]> wrote: > > Have you looked at the explains to see what is coming out of the > FuzzyQuery? Also, are you using Hits to get that score? Scores get > normalized to 1 by that process. > > -Grant > On Apr 11, 2007, at 2:06 AM, Michael Barbarelli wrote: > > > Hello. > > > > I am using Lucene to submit fuzzy queries against an index. I have > > noticed > > that relevant matches are often retreived, but the scoring is not > > at all > > what I expected. > > > > For example, if my query is "rightches~", a reference to a text > > file with > > the single word "righteous" is returned with a score of 100 percent. > > However, I think the actual score should be somewhere in the > > neighborhood of > > .66, not 1. Anyone follow me? Degree of similarity is what I want > > in this > > case. > > > > But Lucene score does not take into account how well a term matches a > > FuzzyQuery. That just seems to be the way Lucene is built > > currently. The > > score is based on term frequency of the actual matching term. > > FuzzyQuery > > gets rewritten as a BooleanQuery with all matching terms OR'd. > > > > Degree of similarity is what I want in this case. When > > "rightches~" matches > > "rightheous", I should get a similarity score of about .66. > > > > What I want is to get at the raw difference that Lucene uses: the > > Levenstein distance algorithm. I think I'll need to use the code in > > FuzzyTermEnum.java (or .cs) as a starting point. I figure I can can > > probably > > use that code directly somehow, or at least borrow the similarity > > computation. > > > > Frankly, though, I'm not sure I'm treading down the right path on > > this. Can > > anyone help with specifics, past experience, or examples? > > > > Cheers, > > Mike > > -------------------------- > Grant Ingersoll > Center for Natural Language Processing > http://www.cnlp.org > > Read the Lucene Java FAQ at http://wiki.apache.org/jakarta-lucene/ > LuceneFAQ > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > >