[ https://issues.apache.org/jira/browse/LUCENE-2271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12835758#action_12835758 ]
Robert Muir commented on LUCENE-2271: ------------------------------------- bq. OK, so it was a design bug too. A design bug that function queries score docs with an invalid score (NaN) instead of throwing an exception? > Function queries producing scores of -inf or NaN (e.g. 1/x) return incorrect > results with TopScoreDocCollector > -------------------------------------------------------------------------------------------------------------- > > Key: LUCENE-2271 > URL: https://issues.apache.org/jira/browse/LUCENE-2271 > Project: Lucene - Java > Issue Type: Improvement > Components: Search > Affects Versions: 2.9 > Reporter: Uwe Schindler > Priority: Minor > Fix For: 2.9.2, 3.0.1, 3.1 > > Attachments: LUCENE-2271.patch, LUCENE-2271.patch, TSDC.patch > > > This is a foolowup to LUCENE-2270, where a part of this problem was fixed > (boost = 0 leading to NaN scores, which is also un-intuitive), but in > general, function queries in Solr can create these invalid scores easily. In > previous version of Lucene these scores ordered correct (except NaN, which > mixes up results), but never invalid document ids are returned (like > Integer.MAX_VALUE). > The problem is: TopScoreDocCollector pre-fills the HitQueue with sentinel > ScoreDocs with a score of -inf and a doc id of Integer.MAX_VALUE. For the HQ > to work, this sentinel must be smaller than all posible values, which is not > the case: > - -inf is equal and the document is not inserted into the HQ, as not > competitive, but the HQ is not yet full, so the sentinel values keep in the > HQ and result is the Integer.MAX_VALUE docs. This problem is solveable (and > only affects the Ordered collector) by chaning the exit condition to: > {code} > if (score <= pqTop.score && pqTop.doc != Integer.MAX_VALUE) { > // Since docs are returned in-order (i.e., increasing doc Id), a document > // with equal score to pqTop.score cannot compete since HitQueue favors > // documents with lower doc Ids. Therefore reject those docs too. > return; > } > {code} > - The NaN case can be fixed in the same way, but then has another problem: > all comparisons with NaN result in false (none of these is true): x < NaN, x > > NaN, NaN == NaN. This leads to the fact that HQ's lessThan always returns > false, leading to unexspected ordering in the PQ and sometimes the sentinel > values do not stay at the top of the queue. A later hit then overrides the > top of the queue but leaves the incorrect sentinels unchanged -> invalid > results. This can be fixed in two ways in HQ: > Force all sentinels to the top: > {code} > protected final boolean lessThan(ScoreDoc hitA, ScoreDoc hitB) { > if (hitA.doc == Integer.MAX_VALUE) > return true; > if (hitB.doc == Integer.MAX_VALUE) > return false; > if (hitA.score == hitB.score) > return hitA.doc > hitB.doc; > else > return hitA.score < hitB.score; > } > {code} > or alternatively have a defined order for NaN (Float.compare sorts them after > +inf): > {code} > protected final boolean lessThan(ScoreDoc hitA, ScoreDoc hitB) { > if (hitA.score == hitB.score) > return hitA.doc > hitB.doc; > else > return Float.compare(hitA.score, hitB.score) < 0; > } > {code} > The problem with both solutions is, that we have now more comparisons per hit > and the use of sentinels is questionable. I would like to remove the > sentinels and use the old pre 2.9 code for comparing and using PQ.add() when > a competitive hit arrives. The order of NaN would be unspecified. > To fix the order of NaN, it would be better to replace all score comparisons > by Float.compare() [also in FieldComparator]. > I would like to delay 2.9.2 and 3.0.1 until this problem is discussed and > solved. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org