Hi all,

I have posted a similar question to the user list but have come up
with a modification to the code that appears to fix my problem. I'd
like to run it past you guys first though :).
My question regards a modification to PhraseScorer.score().
Basically I'm working with some code that uses explain() to determine
the presence of a phrase within a document.

Effectively, it does the following:

// myQuery is a BooleanQuery containing a PhraseQuery and a TermQuery.
Explanation exp = searcher.explain(myQuery, docId); 
if (exp.getValue() > 0)
{
    // Assume that the doc represented by docId satisfies myQuery.
}

Now with the existing code base for 1.3, exp.getValue() would return a
value greater than 0 for _some_ (well only one so far) documents which
did not satisfy myQuery (e.g. if you did a searcher.search(myQuery),
docId would not be in the Hits result).

Obviously this produced misleading results. 
I followed the code through and found the problem to be where
PhraseScorer calls score from it's explain() method.

It was discovered that this could be fixed by add the line prefixed by
>> in the code below.
The problem was it was returning from score() when the "freq" was set
for a document that was _not_ the resulting "first.doc" document. But
explain() thought it was, and so returned the wrong frequency for a
particular document. Unfortunately I don't have a simple test case to
show this. The change I've made simply resets "freq" to ensure that it
is fresh for the next document that is examined.

I know its a big ask but is anyone familiar with this area of code
enough to determine if I've broken anything by make this change?


    public final void score(HitCollector results, int end) throws IOException {
        Similarity similarity = getSimilarity();
        while (last.doc < end) {                          // find doc w/ all the terms
>>            freq = 0.0f;
            while (first.doc < last.doc) {                // scan forward in first
                do {
                    first.next();
                } while (first.doc < last.doc);
                firstToLast();
                if (last.doc >= end)
                    return;
            }

            // found doc with all terms
            freq = phraseFreq();                        // check for phrase

            if (freq > 0.0) {
                float score = similarity.tf(freq) * value;  // compute score
                score *= Similarity.decodeNorm(norms[first.doc]); // normalize
                results.collect(first.doc, score);        // add to results
            }
            last.next();                                  // resume scanning
        }
    }


Thanks in advance, 

-- 
Cheers,

Minh Kama Yie

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to