[
https://issues.apache.org/jira/browse/LUCENE-7643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15860920#comment-15860920
]
ASF subversion and git services commented on LUCENE-7643:
---------------------------------------------------------
Commit a36ebaa90c95d8be6411464c237593a1ff825af0 in lucene-solr's branch
refs/heads/branch_6x from [~jpountz]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=a36ebaa ]
LUCENE-7643,SOLR-10013: Reenable the single-value optimization for sorted dv
too.
> Move IndexOrDocValuesQuery to queries (or core?)
> ------------------------------------------------
>
> Key: LUCENE-7643
> URL: https://issues.apache.org/jira/browse/LUCENE-7643
> Project: Lucene - Core
> Issue Type: Task
> Reporter: Adrien Grand
> Priority: Minor
> Fix For: master (7.0), 6.5
>
> Attachments: LUCENE-7643.patch
>
>
> I was just doing some benchmarking to check that IndexOrDocValues actually
> makes things faster when it is supposed to:
> {noformat}
> TaskQPS baseline StdDev QPS patch StdDev
> Pct diff
> Range25 30.27 (0.6%) 29.22 (4.7%)
> -3.5% ( -8% - 1%)
> Range10 66.74 (0.9%) 64.52 (4.2%)
> -3.3% ( -8% - 1%)
> Term35 18.59 (1.6%) 18.16 (1.9%)
> -2.3% ( -5% - 1%)
> Term02 274.98 (1.1%) 269.47 (1.9%)
> -2.0% ( -4% - 1%)
> AndTerm35Range10 26.82 (2.5%) 26.50 (2.8%)
> -1.2% ( -6% - 4%)
> AndTerm02Range25 56.27 (1.3%) 99.04 (7.9%)
> 76.0% ( 65% - 86%)
> {noformat}
> In the above results, the number after the query type indicates the
> percentage of docs in the index that it matches. With the baseline, range
> queries are simple point range queries, while the patch is an
> {{IndexOrDocValuesQuery}} that wraps both a point range query and a doc
> values query that matches the same documents. As expected,
> {{AndTerm35Range10}} performs the same in both cases since the range is
> supposed to lead the iteration, so the {{IndexOrDocValuesQuery}} is rewritten
> to the wrapped point range query. However with {{AndTerm02Range25}} the range
> cost is higher than the term cost so the range is only used for verifying
> matches and the {{IndexOrDocValuesQuery}} rewrites to the wrapped doc values
> query, yielding a speedup since we do not have to evaluate the range against
> the whole index.
> I think the -2/-3% difference we are seeing for everything else than
> {{AndTerm02Range25}} is noisy since term queries execute exactly the same way
> in both cases, yet they have this slight slowdown too.
> I would like to make it easier to use by moving {{IndexOrDocValuesQuery}} and
> {{DocValuesRangeQuery}} to a different module than sandbox, and giving the
> doc values range query an API that is closer to point ranges by making the
> bounds required (null disallowed) and removing the {{includeLower}} and
> {{includeUpper}} parameters. I wanted to move to {{queries}} initially but
> maybe {{core}} is better, that way we could link from the point API to
> {{IndexOrDocValuesQuery}} as a way to make queries on fields that have both
> points and doc values more efficient.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]