Go for a HitCollector. In particular, TopDocs will give you the raw
scores.

Erick

On 4/11/07, Michael Barbarelli <[EMAIL PROTECTED]> wrote:

Hi Grant.

Yes, I'm getting the score from the Hits collection.  And yes, they get
normalized to 1; which is what I don't want.

Or, I can leave the Hits objects as is, but I know Lucene also must
calculate a raw difference as part of the overall score calculation.
How can I get at that value?

Thanks!

Mike


On 4/11/07, Grant Ingersoll <[EMAIL PROTECTED]> wrote:
>
> Have you looked at the explains to see what is coming out of the
> FuzzyQuery?  Also, are you using Hits to get that score?  Scores get
> normalized to 1 by that process.
>
> -Grant
> On Apr 11, 2007, at 2:06 AM, Michael Barbarelli wrote:
>
> > Hello.
> >
> > I am using Lucene to submit fuzzy queries against an index. I have
> > noticed
> > that relevant matches are often retreived, but the scoring is not
> > at all
> > what I expected.
> >
> > For example, if my query is "rightches~", a reference to a text
> > file with
> > the single word "righteous" is returned with a score of 100 percent.
> > However, I think the actual score should be somewhere in the
> > neighborhood of
> > .66, not 1. Anyone follow me?  Degree of similarity is what I want
> > in this
> > case.
> >
> > But Lucene score does not take into account how well a term matches a
> > FuzzyQuery. That just seems to be the way Lucene is built
> > currently. The
> > score is based on term frequency of the actual matching term.
> > FuzzyQuery
> > gets rewritten as a BooleanQuery with all matching terms OR'd.
> >
> > Degree of similarity is what I want in this case.  When
> > "rightches~" matches
> > "rightheous", I should get a similarity score of about .66.
> >
> > What I want is to get at the raw difference that Lucene uses:  the
> > Levenstein distance algorithm.  I think I'll need to use the code in
> > FuzzyTermEnum.java (or .cs) as a starting point. I figure I can can
> > probably
> > use that code directly somehow, or at least borrow the similarity
> > computation.
> >
> > Frankly, though, I'm not sure I'm treading down the right path on
> > this.  Can
> > anyone help with specifics, past experience, or examples?
> >
> > Cheers,
> > Mike
>
> --------------------------
> Grant Ingersoll
> Center for Natural Language Processing
> http://www.cnlp.org
>
> Read the Lucene Java FAQ at http://wiki.apache.org/jakarta-lucene/
> LuceneFAQ
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>

Reply via email to