Adrien Grand created LUCENE-7643:
------------------------------------
Summary: Move IndexOrDocValuesQuery to queries (or core?)
Key: LUCENE-7643
URL: https://issues.apache.org/jira/browse/LUCENE-7643
Project: Lucene - Core
Issue Type: Task
Reporter: Adrien Grand
Priority: Minor
I was just doing some benchmarking to check that IndexOrDocValues actually
makes things faster when it is supposed to:
{noformat}
TaskQPS baseline StdDev QPS patch StdDev
Pct diff
Range25 30.27 (0.6%) 29.22 (4.7%)
-3.5% ( -8% - 1%)
Range10 66.74 (0.9%) 64.52 (4.2%)
-3.3% ( -8% - 1%)
Term35 18.59 (1.6%) 18.16 (1.9%)
-2.3% ( -5% - 1%)
Term02 274.98 (1.1%) 269.47 (1.9%)
-2.0% ( -4% - 1%)
AndTerm35Range10 26.82 (2.5%) 26.50 (2.8%)
-1.2% ( -6% - 4%)
AndTerm02Range25 56.27 (1.3%) 99.04 (7.9%)
76.0% ( 65% - 86%)
{noformat}
In the above results, the number after the query type indicates the percentage
of docs in the index that it matches. With the baseline, range queries are
simple point range queries, while the patch is an {{IndexOrDocValuesQuery}}
that wraps both a point range query and a doc values query that matches the
same documents. As expected, {{AndTerm35Range10}} performs the same in both
cases since the range is supposed to lead the iteration, so the
{{IndexOrDocValuesQuery}} is rewritten to the wrapped point range query.
However with {{AndTerm02Range25}} the range cost is higher than the term cost
so the range is only used for verifying matches and the
{{IndexOrDocValuesQuery}} rewrites to the wrapped doc values query, yielding a
speedup since we do not have to evaluate the range against the whole index.
I think the -2/-3% difference we are seeing for everything else than
{{AndTerm02Range25}} is noisy since term queries execute exactly the same way
in both cases, yet they have this slight slowdown too.
I would like to make it easier to use by moving {{IndexOrDocValuesQuery}} and
{{DocValuesRangeQuery}} to a different module than sandbox, and giving the doc
values range query an API that is closer to point ranges by making the bounds
required (null disallowed) and removing the {{includeLower}} and
{{includeUpper}} parameters. I wanted to move to {{queries}} initially but
maybe {{core}} is better, that way we could link from the point API to
{{IndexOrDocValuesQuery}} as a way to make queries on fields that have both
points and doc values more efficient.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]