TestParser.testSpanTermXML fails with some sims
-----------------------------------------------
Key: LUCENE-3430
URL: https://issues.apache.org/jira/browse/LUCENE-3430
Project: Lucene - Java
Issue Type: Bug
Affects Versions: 4.0
Reporter: Robert Muir
Fix For: 4.0
here is why this test sometimes fails (my explanation in the test i wrote):
{noformat}
/** make sure all sims work with spanOR(termX, termY) where termY does not
exist */
public void testCrazySpans() throws Exception {
// The problem: "normal" lucene queries create scorers, returning null if
terms dont exist
// This means they never score a term that does not exist.
// however with spans, there is only one scorer for the whole hierarchy:
// inner queries are not real queries, their boosts are ignored, etc.
{noformat}
Basically, SpanQueries aren't really queries, you just get one scorer. it calls
extractTerms on the whole hierarchy and computes weights (e.g. IDF) on
the whole bag of terms, even if they don't exist.
This is fine, we already have tests that sim's won't bug-out in computeStats()
here: however they don't expect to actually score documents based on
these terms that don't exist... however this is exactly what happens in Spans
because it doesn't use sub-scorers.
Lucene's sim avoids this with the (docFreq + 1)
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]