Hi all, I have posted a similar question to the user list but have come up with a modification to the code that appears to fix my problem. I'd like to run it past you guys first though :). My question regards a modification to PhraseScorer.score(). Basically I'm working with some code that uses explain() to determine the presence of a phrase within a document.
Effectively, it does the following: // myQuery is a BooleanQuery containing a PhraseQuery and a TermQuery. Explanation exp = searcher.explain(myQuery, docId); if (exp.getValue() > 0) { // Assume that the doc represented by docId satisfies myQuery. } Now with the existing code base for 1.3, exp.getValue() would return a value greater than 0 for _some_ (well only one so far) documents which did not satisfy myQuery (e.g. if you did a searcher.search(myQuery), docId would not be in the Hits result). Obviously this produced misleading results. I followed the code through and found the problem to be where PhraseScorer calls score from it's explain() method. It was discovered that this could be fixed by add the line prefixed by >> in the code below. The problem was it was returning from score() when the "freq" was set for a document that was _not_ the resulting "first.doc" document. But explain() thought it was, and so returned the wrong frequency for a particular document. Unfortunately I don't have a simple test case to show this. The change I've made simply resets "freq" to ensure that it is fresh for the next document that is examined. I know its a big ask but is anyone familiar with this area of code enough to determine if I've broken anything by make this change? public final void score(HitCollector results, int end) throws IOException { Similarity similarity = getSimilarity(); while (last.doc < end) { // find doc w/ all the terms >> freq = 0.0f; while (first.doc < last.doc) { // scan forward in first do { first.next(); } while (first.doc < last.doc); firstToLast(); if (last.doc >= end) return; } // found doc with all terms freq = phraseFreq(); // check for phrase if (freq > 0.0) { float score = similarity.tf(freq) * value; // compute score score *= Similarity.decodeNorm(norms[first.doc]); // normalize results.collect(first.doc, score); // add to results } last.next(); // resume scanning } } Thanks in advance, -- Cheers, Minh Kama Yie --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]