[jira] [Commented] (LUCENE-7618) Hypothetical perf improvements in DocValuesRangeQuery: reducing comparisons for some queries/segments

Hoss Man (JIRA) Wed, 04 Jan 2017 09:36:09 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-7618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15798829#comment-15798829
 ]


Hoss Man commented on LUCENE-7618:
----------------------------------

bq. I could be wrong, but it looks to me that the cut that we are trying to cut 
here is low compared to the cost of running a query, like checking live docs, 
running first the approximation and then the two-phase confirmation, calling 
the collector, etc. Adding more implementations might also make Hotspot's job 
more complicated.

I would guess you are correct.

bq. I am not entirely sure what was your motivation for reviewing doc values 
ranges, ...

I didn't have a particularly strong motivation, it was just some idle 
experimentation during my vacation based on the converstaion i mentioned ...
* i was already looking at the DocValuesRangeQueries to see if my hunch was 
correct
* saving one comparison per doc for common queries _seemed_ like a nice win
* Solr still uses DocValuesRangeQueries, and even if/when solr starts using 
points, i've seen enough people who value index size over speed *AND* need to 
do range queries on sort fields that i can't imagine completely eliminating 
usage of it completely because some users will still want non-stored, 
non-indexed, non-points, docvalues fields" (especailly once updatable docvalues 
support finally lands in solr, then even if you don't mind bigger indexes, 
you'll want/need DocValuesRangeQueries to be able to query against your updated 
docvalue fields)
* saving one comparison per doc for common queries _seemed_ like an easy win.
* I didn't/don't have enough familiarity with the points (query) code to guess 
if/where/what might be an equivalent otimization - so i wanted to start with 
the code i understood first.

bq. I have an open semi-related issue ... LUCENE-7055.

thanks for that pointer ... my head already hurts from reading the first few 
comments :)

> Hypothetical perf improvements in DocValuesRangeQuery: reducing comparisons 
> for some queries/segments
> -----------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-7618
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7618
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Hoss Man
>         Attachments: LUCENE-7618.patch
>
>
> In reviewing the DocValuesRangeQuery code, it occured to me that there 
> _might_ be some potential performance optimizations possible in a few cases 
> relating queries that involve explicitly specified open ranges (ie: min or 
> max are null) or in the case of SortedSet: range queries that are 
> *effectively* open ended on particular segments, because the min/max are 
> below/above the minOrd/maxOrd for the segment.
> Since these seemed like semi-common situations (open ended range queries are 
> fairly common in my experience, i'm not sure about the secondary SortedSet 
> "ord" case, but it seemd potentially promising particularly for fields like 
> incrementing ids, or timestamps, where values are added sequentially and 
> likeley to be clustered together) I did a bit of experimenting and wanted to 
> post my findings in jira -- patch & details to follow in comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-7618) Hypothetical perf improvements in DocValuesRangeQuery: reducing comparisons for some queries/segments

Reply via email to