Adrien Grand created LUCENE-6255:
------------------------------------
Summary: PhraseQuery inconsistencies
Key: LUCENE-6255
URL: https://issues.apache.org/jira/browse/LUCENE-6255
Project: Lucene - Core
Issue Type: Bug
Reporter: Adrien Grand
Assignee: Adrien Grand
PhraseQuery behaves quite inconsistently when the position of the first term is
greater than 0. Here is an example:
{noformat}
Directory dir = newDirectory();
RandomIndexWriter iw = new RandomIndexWriter(random(), dir);
FieldType customType = new FieldType(TextField.TYPE_NOT_STORED);
customType.setOmitNorms(true);
Field f = new Field("body", "", customType);
Document doc = new Document();
doc.add(f);
f.setStringValue("one quick fox");
iw.addDocument(doc);
IndexReader ir = iw.getReader();
iw.close();
IndexSearcher is = newSearcher(ir);
PhraseQuery pq = new PhraseQuery();
pq.add(new Term("body", "quick"), 0);
pq.add(new Term("body", "fox"), 1);
System.out.println(is.search(pq, 1).totalHits); // 1
pq = new PhraseQuery();
pq.add(new Term("body", "quick"), 10);
pq.add(new Term("body", "fox"), 11);
System.out.println(is.search(pq, 1).totalHits); // 0
pq = new PhraseQuery();
pq.add(new Term("body", "quick"), 10);
System.out.println(is.search(pq, 1).totalHits); // 1
pq = new PhraseQuery();
pq.add(new Term("body", "quick"), 10);
pq.add(new Term("body", "fox"), 11);
pq.setSlop(1);
System.out.println(is.search(pq, 1).totalHits); // 1
ir.close();
dir.close();
{noformat}
The reason is that when you add a term with position P on a PhraseQuery,
ExactPhraseScorer ignores all positions for this term which are less than P.
But this is inconsistent:
- if you have a single term, it does not work anymore since we rewrite to a
term query regardless of the position of the term (3rd query)
- if you increase the slop, we will use SloppyPhraseScorer which does not have
this behaviour. (4th query)
So I think we have two options:
- either remove this behaviour and make the positions that are provided to
PhraseQuery only relative (ie. fix ExactPhraseScorer)
- or make it work this way across the board (which means not rewriting to a
term query when the position is not 0 and fixing SloppyPhraseScorer).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]