[jira] [Commented] (SOLR-14166) Use TwoPhaseIterator for non-cached filter queries

David Smiley (Jira) Sun, 07 Mar 2021 12:09:06 -0800


    [ 
https://issues.apache.org/jira/browse/SOLR-14166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17296964#comment-17296964
 ]


David Smiley commented on SOLR-14166:
-------------------------------------

CC [~yonik] [~jbernste] [~hossman] as possible reviewers for this attached PR 
which is rather technical into code which few people have touched but you all 
three in some shape/form.  Please review the issue description, and take a look 
at the PR.  In the PR, each commit is well isolated to the what the commit 
message says, so you may prefer to go commit-by-commit, or you could just look 
at the thing as a whole.  In a comment above I pondered "Maybe we could make a 
wrapping query that wraps the underlying TPI.matchCost"; as you'll see in the 
PR, I did that.  The test works in validating that match() isn't called more 
than it needs to be.  It used to be called more which is verifiable by copying 
the test to the 8x line (if I recall, it was called two additional times).  I 
suspect the test doesn't test that MatchCostQuery is having an effect... I may 
need to think a bit more on how to do that.

I suspect someone will ask me if I did some performance tests.  No I did not.  
My goal is removal of tech debt -- Filter, and in the process expect some 
performance improvements that Filter was blocking.  So in this issue, anyone 
with non-cached filter queries may see a benefit, especially when those queries 
have TwoPhaseIterators (phrase queries, frange, spatial, more).  The benefit 
may be further pronounced if the main query also has TPIs because Lucene 
cleverly sees through the boolean queries to group the TPIs of required clauses 
in the tree.

> Use TwoPhaseIterator for non-cached filter queries
> --------------------------------------------------
>
>                 Key: SOLR-14166
>                 URL: https://issues.apache.org/jira/browse/SOLR-14166
>             Project: Solr
>          Issue Type: Sub-task
>            Reporter: David Smiley
>            Assignee: David Smiley
>            Priority: Major
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> "fq" filter queries that have cache=false and which aren't processed as a 
> PostFilter (thus either aren't a PostFilter or have a cost < 100) are 
> processed in SolrIndexSearcher using a custom Filter thingy which uses a 
> cost-ordered series of DocIdSetIterators.  This is not TwoPhaseIterator 
> aware, and thus the match() method may be called on docs that ideally would 
> have been filtered by lower-cost filter queries.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14166) Use TwoPhaseIterator for non-cached filter queries

Reply via email to