[
https://issues.apache.org/jira/browse/LUCENE-6229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14314361#comment-14314361
]
Terry Smith commented on LUCENE-6229:
-------------------------------------
h2. freq() vs score()
I think the lazy positioning in MinShouldMatchSumScorer is misbehaving.
Drop these three methods into TestBooleanMinShouldMatch.java to see.
{code:java}
public void testMinNrShouldMatchFreq() throws Exception {
BooleanQuery q = new BooleanQuery();
q.add(new TermQuery(new Term("data", "1")), Occur.SHOULD);
q.add(new TermQuery(new Term("data", "2")), Occur.SHOULD);
q.add(new TermQuery(new Term("data", "3")), Occur.SHOULD);
q.add(new TermQuery(new Term("id", "0")), Occur.MUST);
q.setMinimumNumberShouldMatch(2);
verifyNrHits(q, 1);
s.search(q, new SimpleCollector() {
private Scorer scorer;
private Collection<Scorer> leafScorers;
@Override
public void setScorer(Scorer scorer) throws IOException {
this.scorer = scorer;
this.leafScorers = leafScorers(new ArrayList<Scorer>(), scorer);
assertEquals(4, leafScorers.size());
}
@Override
public void collect(int doc) throws IOException {
assertEquals(0, doc);
scorer.freq(); // position leaf scorers
for (Scorer leafScorer : leafScorers) {
assertEquals(0, leafScorer.docID());
}
}
});
}
public void testMinNrShouldMatchScore() throws Exception {
BooleanQuery q = new BooleanQuery();
q.add(new TermQuery(new Term("data", "1")), Occur.SHOULD);
q.add(new TermQuery(new Term("data", "2")), Occur.SHOULD);
q.add(new TermQuery(new Term("data", "3")), Occur.SHOULD);
q.add(new TermQuery(new Term("id", "0")), Occur.MUST);
q.setMinimumNumberShouldMatch(2);
verifyNrHits(q, 1);
s.search(q, new SimpleCollector() {
private Scorer scorer;
private Collection<Scorer> leafScorers;
@Override
public void setScorer(Scorer scorer) throws IOException {
this.scorer = scorer;
this.leafScorers = leafScorers(new ArrayList<Scorer>(), scorer);
assertEquals(4, leafScorers.size());
}
@Override
public void collect(int doc) throws IOException {
assertEquals(0, doc);
scorer.score(); // position leaf scorers
for (Scorer leafScorer : leafScorers) {
assertEquals(0, leafScorer.docID());
}
}
});
}
private static Collection<Scorer> leafScorers(Collection<Scorer> target,
Scorer scorer) {
Collection<ChildScorer> childScorers = scorer.getChildren();
if (childScorers.isEmpty()) {
target.add(scorer);
} else {
for (ChildScorer childScorer : childScorers) {
leafScorers(target, childScorer.child);
}
}
return target;
}
{code}
Here the one that uses freq() to position the sub scorers fails but the one
that uses score() succeeds.
h2. middle ground
I have Scorer constructors, Weight.scorer(), Weight.explain() and Collectors
all calling Scorer.getChildren(). But when using my custom Collectors I'm
careful to wrap the Query in a custom NonBulkScoringQuery that prevents bulk
scoring to work around the trap. The NonBulkScoringQuery I mention is a simple
delegating Query that allows Weight.bulkScorer() to use it's default
implementation instead of allowing the wrapped Query to override it.
I like removing the trap for bulk scoring queries, it's really subtle and it
took me a while to diagnose the first time I hit it.
Having a separate entry point into IndexSearcher to achieve doc-at-a-time
scoring that supports getChildren() would be awesome. I'm not so hot on having
to cast the collector, do you think there could be a way to preserve type
safety here?
> Remove Scorer.getChildren?
> --------------------------
>
> Key: LUCENE-6229
> URL: https://issues.apache.org/jira/browse/LUCENE-6229
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Adrien Grand
> Priority: Minor
>
> This API is used in a single place in our code base:
> ToParentBlockJoinCollector. In addition, the usage is a bit buggy given that
> using this API from a collector only works if setScorer is called with an
> actual Scorer (and not eg. FakeScorer or BooleanScorer like you would get in
> disjunctions) so it needs a custom IndexSearcher that does not use the
> BulkScorer API.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]