[jira] [Commented] (LUCENE-7055) Better execution path for costly queries

Jim Ferenczi (JIRA) Mon, 26 Dec 2016 01:19:22 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-7055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15777952#comment-15777952
 ]


Jim Ferenczi commented on LUCENE-7055:
--------------------------------------

I like the new cost estimation and the lazy scorer but maybe instead of a 
boolean LazyScorer#get should take the min cost as an argument. With a simple 
boolean it's the parent query that leads the decision based on the min cost. 
The min cost could be big and the intersection with the point query could be 
sparse so I think it would be more flexible if the IndexOrDocValuesQuery makes 
the choice. I also wonder if it's possible to completely disable the search for 
the next doc ids in the DocValuesNumbersQuery. Isn't it possible to transform 
this type of query in a simple filter that accepts or rejects docids ? This 
would eliminate the need to switch to a point query when the min cost is 
smaller than the point query cost but big enough to make the docvalues query 
costly since it will need to find the next docids that matches the range every 
time the leading iteration finds a match. 

> Better execution path for costly queries
> ----------------------------------------
>
>                 Key: LUCENE-7055
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7055
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Assignee: Adrien Grand
>         Attachments: LUCENE-7055.patch
>
>
> In Lucene 5.0, we improved the execution path for queries that run costly 
> operations on a per-document basis, like phrase queries or doc values 
> queries. But we have another class of costly queries, that return fine 
> iterators, but these iterators are very expensive to build. This is typically 
> the case for queries that leverage DocIdSetBuilder, like TermsQuery, 
> multi-term queries or the new point queries. Intersecting such queries with a 
> selective query is very inefficient since these queries build a doc id set of 
> matching documents for the entire index.
> Is there something we could do to improve the execution path for these 
> queries?
> One idea that comes to mind is that most of these queries could also run on 
> doc values, so maybe we could come up with something that would help decide 
> how to run a query based on other parts of the query? (Just thinking out 
> loud, other ideas are very welcome)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-7055) Better execution path for costly queries

Reply via email to