Hi Grant. Yes, I'm getting the score from the Hits collection. And yes, they get normalized to 1; which is what I don't want.
Or, I can leave the Hits objects as is, but I know Lucene also must calculate a raw difference as part of the overall score calculation. How can I get at that value? Thanks! Mike On 4/11/07, Grant Ingersoll <[EMAIL PROTECTED]> wrote:
Have you looked at the explains to see what is coming out of the FuzzyQuery? Also, are you using Hits to get that score? Scores get normalized to 1 by that process. -Grant On Apr 11, 2007, at 2:06 AM, Michael Barbarelli wrote: > Hello. > > I am using Lucene to submit fuzzy queries against an index. I have > noticed > that relevant matches are often retreived, but the scoring is not > at all > what I expected. > > For example, if my query is "rightches~", a reference to a text > file with > the single word "righteous" is returned with a score of 100 percent. > However, I think the actual score should be somewhere in the > neighborhood of > .66, not 1. Anyone follow me? Degree of similarity is what I want > in this > case. > > But Lucene score does not take into account how well a term matches a > FuzzyQuery. That just seems to be the way Lucene is built > currently. The > score is based on term frequency of the actual matching term. > FuzzyQuery > gets rewritten as a BooleanQuery with all matching terms OR'd. > > Degree of similarity is what I want in this case. When > "rightches~" matches > "rightheous", I should get a similarity score of about .66. > > What I want is to get at the raw difference that Lucene uses: the > Levenstein distance algorithm. I think I'll need to use the code in > FuzzyTermEnum.java (or .cs) as a starting point. I figure I can can > probably > use that code directly somehow, or at least borrow the similarity > computation. > > Frankly, though, I'm not sure I'm treading down the right path on > this. Can > anyone help with specifics, past experience, or examples? > > Cheers, > Mike -------------------------- Grant Ingersoll Center for Natural Language Processing http://www.cnlp.org Read the Lucene Java FAQ at http://wiki.apache.org/jakarta-lucene/ LuceneFAQ --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]