[
https://issues.apache.org/jira/browse/LUCENE-7618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15798829#comment-15798829
]
Hoss Man commented on LUCENE-7618:
----------------------------------
bq. I could be wrong, but it looks to me that the cut that we are trying to cut
here is low compared to the cost of running a query, like checking live docs,
running first the approximation and then the two-phase confirmation, calling
the collector, etc. Adding more implementations might also make Hotspot's job
more complicated.
I would guess you are correct.
bq. I am not entirely sure what was your motivation for reviewing doc values
ranges, ...
I didn't have a particularly strong motivation, it was just some idle
experimentation during my vacation based on the converstaion i mentioned ...
* i was already looking at the DocValuesRangeQueries to see if my hunch was
correct
* saving one comparison per doc for common queries _seemed_ like a nice win
* Solr still uses DocValuesRangeQueries, and even if/when solr starts using
points, i've seen enough people who value index size over speed *AND* need to
do range queries on sort fields that i can't imagine completely eliminating
usage of it completely because some users will still want non-stored,
non-indexed, non-points, docvalues fields" (especailly once updatable docvalues
support finally lands in solr, then even if you don't mind bigger indexes,
you'll want/need DocValuesRangeQueries to be able to query against your updated
docvalue fields)
* saving one comparison per doc for common queries _seemed_ like an easy win.
* I didn't/don't have enough familiarity with the points (query) code to guess
if/where/what might be an equivalent otimization - so i wanted to start with
the code i understood first.
bq. I have an open semi-related issue ... LUCENE-7055.
thanks for that pointer ... my head already hurts from reading the first few
comments :)
> Hypothetical perf improvements in DocValuesRangeQuery: reducing comparisons
> for some queries/segments
> -----------------------------------------------------------------------------------------------------
>
> Key: LUCENE-7618
> URL: https://issues.apache.org/jira/browse/LUCENE-7618
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Hoss Man
> Attachments: LUCENE-7618.patch
>
>
> In reviewing the DocValuesRangeQuery code, it occured to me that there
> _might_ be some potential performance optimizations possible in a few cases
> relating queries that involve explicitly specified open ranges (ie: min or
> max are null) or in the case of SortedSet: range queries that are
> *effectively* open ended on particular segments, because the min/max are
> below/above the minOrd/maxOrd for the segment.
> Since these seemed like semi-common situations (open ended range queries are
> fairly common in my experience, i'm not sure about the secondary SortedSet
> "ord" case, but it seemd potentially promising particularly for fields like
> incrementing ids, or timestamps, where values are added sequentially and
> likeley to be clustered together) I did a bit of experimenting and wanted to
> post my findings in jira -- patch & details to follow in comments.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]