[ 
https://issues.apache.org/jira/browse/LUCENE-6229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14314361#comment-14314361
 ] 

Terry Smith commented on LUCENE-6229:
-------------------------------------

h2. freq() vs score()

I think the lazy positioning in MinShouldMatchSumScorer is misbehaving.

Drop these three methods into TestBooleanMinShouldMatch.java to see.
{code:java}
    public void testMinNrShouldMatchFreq() throws Exception {
      BooleanQuery q = new BooleanQuery();
      q.add(new TermQuery(new Term("data", "1")), Occur.SHOULD);
      q.add(new TermQuery(new Term("data", "2")), Occur.SHOULD);
      q.add(new TermQuery(new Term("data", "3")), Occur.SHOULD);
      q.add(new TermQuery(new Term("id", "0")), Occur.MUST);
      q.setMinimumNumberShouldMatch(2);
      verifyNrHits(q, 1);
      s.search(q, new SimpleCollector() {
        private Scorer scorer;
        private Collection<Scorer> leafScorers;
        @Override
        public void setScorer(Scorer scorer) throws IOException {
          this.scorer = scorer;
          this.leafScorers = leafScorers(new ArrayList<Scorer>(), scorer);
          assertEquals(4, leafScorers.size());
        }

        @Override
        public void collect(int doc) throws IOException {
          assertEquals(0, doc);
          scorer.freq(); // position leaf scorers
          for (Scorer leafScorer : leafScorers) {
            assertEquals(0, leafScorer.docID());
          }
        }
      });
    }

    public void testMinNrShouldMatchScore() throws Exception {
      BooleanQuery q = new BooleanQuery();
      q.add(new TermQuery(new Term("data", "1")), Occur.SHOULD);
      q.add(new TermQuery(new Term("data", "2")), Occur.SHOULD);
      q.add(new TermQuery(new Term("data", "3")), Occur.SHOULD);
      q.add(new TermQuery(new Term("id", "0")), Occur.MUST);
      q.setMinimumNumberShouldMatch(2);
      verifyNrHits(q, 1);
      s.search(q, new SimpleCollector() {
        private Scorer scorer;
        private Collection<Scorer> leafScorers;
        @Override
        public void setScorer(Scorer scorer) throws IOException {
          this.scorer = scorer;
          this.leafScorers = leafScorers(new ArrayList<Scorer>(), scorer);
          assertEquals(4, leafScorers.size());
        }

        @Override
        public void collect(int doc) throws IOException {
          assertEquals(0, doc);
          scorer.score(); // position leaf scorers
          for (Scorer leafScorer : leafScorers) {
            assertEquals(0, leafScorer.docID());
          }
        }
      });
    }

    private static Collection<Scorer> leafScorers(Collection<Scorer> target, 
Scorer scorer) {
      Collection<ChildScorer> childScorers = scorer.getChildren();
      if (childScorers.isEmpty()) {
        target.add(scorer);
      } else {
        for (ChildScorer childScorer : childScorers) {
          leafScorers(target, childScorer.child);
        }
      }
      return target;
    }
{code}

Here the one that uses freq() to position the sub scorers fails but the one 
that uses score() succeeds.

h2. middle ground

I have Scorer constructors, Weight.scorer(), Weight.explain() and Collectors 
all calling Scorer.getChildren(). But when using my custom Collectors I'm 
careful to wrap the Query in a custom NonBulkScoringQuery that prevents bulk 
scoring to work around the trap. The NonBulkScoringQuery I mention is a simple 
delegating Query that allows Weight.bulkScorer() to use it's default 
implementation instead of allowing the wrapped Query to override it.

I like removing the trap for bulk scoring queries, it's really subtle and it 
took me a while to diagnose the first time I hit it.

Having a separate entry point into IndexSearcher to achieve doc-at-a-time 
scoring that supports getChildren() would be awesome. I'm not so hot on having 
to cast the collector, do you think there could be a way to preserve type 
safety here?


> Remove Scorer.getChildren?
> --------------------------
>
>                 Key: LUCENE-6229
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6229
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Priority: Minor
>
> This API is used in a single place in our code base: 
> ToParentBlockJoinCollector. In addition, the usage is a bit buggy given that 
> using this API from a collector only works if setScorer is called with an 
> actual Scorer (and not eg. FakeScorer or BooleanScorer like you would get in 
> disjunctions) so it needs a custom IndexSearcher that does not use the 
> BulkScorer API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to