Hi Grant.

Yes, I'm getting the score from the Hits collection.  And yes, they get
normalized to 1; which is what I don't want.

Or, I can leave the Hits objects as is, but I know Lucene also must
calculate a raw difference as part of the overall score calculation.
How can I get at that value?

Thanks!

Mike


On 4/11/07, Grant Ingersoll <[EMAIL PROTECTED]> wrote:

Have you looked at the explains to see what is coming out of the
FuzzyQuery?  Also, are you using Hits to get that score?  Scores get
normalized to 1 by that process.

-Grant
On Apr 11, 2007, at 2:06 AM, Michael Barbarelli wrote:

> Hello.
>
> I am using Lucene to submit fuzzy queries against an index. I have
> noticed
> that relevant matches are often retreived, but the scoring is not
> at all
> what I expected.
>
> For example, if my query is "rightches~", a reference to a text
> file with
> the single word "righteous" is returned with a score of 100 percent.
> However, I think the actual score should be somewhere in the
> neighborhood of
> .66, not 1. Anyone follow me?  Degree of similarity is what I want
> in this
> case.
>
> But Lucene score does not take into account how well a term matches a
> FuzzyQuery. That just seems to be the way Lucene is built
> currently. The
> score is based on term frequency of the actual matching term.
> FuzzyQuery
> gets rewritten as a BooleanQuery with all matching terms OR'd.
>
> Degree of similarity is what I want in this case.  When
> "rightches~" matches
> "rightheous", I should get a similarity score of about .66.
>
> What I want is to get at the raw difference that Lucene uses:  the
> Levenstein distance algorithm.  I think I'll need to use the code in
> FuzzyTermEnum.java (or .cs) as a starting point. I figure I can can
> probably
> use that code directly somehow, or at least borrow the similarity
> computation.
>
> Frankly, though, I'm not sure I'm treading down the right path on
> this.  Can
> anyone help with specifics, past experience, or examples?
>
> Cheers,
> Mike

--------------------------
Grant Ingersoll
Center for Natural Language Processing
http://www.cnlp.org

Read the Lucene Java FAQ at http://wiki.apache.org/jakarta-lucene/
LuceneFAQ



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Reply via email to