[ https://issues.apache.org/jira/browse/LUCENE-3068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13029150#comment-13029150 ]
Doron Cohen commented on LUCENE-3068: ------------------------------------- This is more complex than I originally thought. # QueryParser creates a MultiplePhraseQuery (MPQ) when one of the (phrase) query positions is a multi-term. # MPQ has an implicit OR behavior - it is used for e.g. wildcarding a phrase query. # PhraseQuery (PQ) sloppy scorer assumes each query position has a single term. # PQ with several terms in same position cannot be created by parsing it with a QP, only manual. Manually created, it would have an AND semantics: only docs with ALL the terms in pos N should match. In other words, assume doc D terms and positions are: a:0 b:1 c:1 d:2 MPQ for (a,b):0 d:1 should match D, finding the phrase b:1 d:2 (OR semantics) PQ for (a,b):0 d:1 should not match D, because it does not contain 'a' and 'b' in the same position (AND semantics). Therefore, rewriting PQ into MPQ is not a valid fix, because it would turn the AND logic assumed by creating the PQ this way, by an OR logic as assumed in MPQ. {code:title=TestPositionIncrement.testSetPosition has a test for this case exactly} // phrase query should fail for non existing searched term // even if there exist another searched terms in the same searched position. q = new PhraseQuery(); q.add(new Term("field", "3"),0); q.add(new Term("field", "9"),0); hits = searcher.search(q, null, 1000).scoreDocs; assertEquals(0, hits.length); {code} Although QP by default will not create this PQ, I think we need to support it, for applications needing to be strict with the search results, with slop. So fixing this would need to take place inside SloppyScorer, digging further... > The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at > same position > ------------------------------------------------------------------------------------------ > > Key: LUCENE-3068 > URL: https://issues.apache.org/jira/browse/LUCENE-3068 > Project: Lucene - Java > Issue Type: Bug > Components: Search > Affects Versions: 3.0.3, 3.1, 4.0 > Reporter: Michael McCandless > Assignee: Doron Cohen > Priority: Minor > Fix For: 3.2, 4.0 > > Attachments: LUCENE-3068.patch, LUCENE-3068.patch > > > In LUCENE-736 we made fixes to SloppyPhraseScorer, because it was > matching docs that it shouldn't; but I think those changes caused it > to fail to match docs that it should, specifically when the doc itself > has tokens at the same position. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org