[
https://issues.apache.org/jira/browse/LUCENE-7643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15832212#comment-15832212
]
Hoss Man commented on LUCENE-7643:
----------------------------------
Something about this change appears to have introduced an NPE risk that one of
Solr's randomized tests caught (see SOLR-10013 for full details)...
{noformat}
[junit4] > Throwable #1: java.lang.RuntimeException: Exception during
query
[junit4] > at
__randomizedtesting.SeedInfo.seed([690818771545E96F:51983624D9EDF0F4]:0)
[junit4] > at
org.apache.solr.SolrTestCaseJ4.assertQ(SolrTestCaseJ4.java:821)
[junit4] > at
org.apache.solr.SolrTestCaseJ4.assertQ(SolrTestCaseJ4.java:788)
[junit4] > at
org.apache.solr.schema.DocValuesTest.testFloatAndDoubleRangeQueryRandom(DocValuesTest.java:618)
...
[junit4] > Caused by: java.lang.NullPointerException
[junit4] > at
org.apache.lucene.document.SortedNumericDocValuesRangeQuery$1$1.matches(SortedNumericDocValuesRangeQuery.java:114)
[junit4] > at
org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll(Weight.java:253)
[junit4] > at
org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:197)
[junit4] > at
org.apache.lucene.search.BulkScorer.score(BulkScorer.java:39)
[junit4] > at
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:669)
...
{noformat}
> Move IndexOrDocValuesQuery to queries (or core?)
> ------------------------------------------------
>
> Key: LUCENE-7643
> URL: https://issues.apache.org/jira/browse/LUCENE-7643
> Project: Lucene - Core
> Issue Type: Task
> Reporter: Adrien Grand
> Priority: Minor
> Fix For: master (7.0), 6.5
>
> Attachments: LUCENE-7643.patch
>
>
> I was just doing some benchmarking to check that IndexOrDocValues actually
> makes things faster when it is supposed to:
> {noformat}
> TaskQPS baseline StdDev QPS patch StdDev
> Pct diff
> Range25 30.27 (0.6%) 29.22 (4.7%)
> -3.5% ( -8% - 1%)
> Range10 66.74 (0.9%) 64.52 (4.2%)
> -3.3% ( -8% - 1%)
> Term35 18.59 (1.6%) 18.16 (1.9%)
> -2.3% ( -5% - 1%)
> Term02 274.98 (1.1%) 269.47 (1.9%)
> -2.0% ( -4% - 1%)
> AndTerm35Range10 26.82 (2.5%) 26.50 (2.8%)
> -1.2% ( -6% - 4%)
> AndTerm02Range25 56.27 (1.3%) 99.04 (7.9%)
> 76.0% ( 65% - 86%)
> {noformat}
> In the above results, the number after the query type indicates the
> percentage of docs in the index that it matches. With the baseline, range
> queries are simple point range queries, while the patch is an
> {{IndexOrDocValuesQuery}} that wraps both a point range query and a doc
> values query that matches the same documents. As expected,
> {{AndTerm35Range10}} performs the same in both cases since the range is
> supposed to lead the iteration, so the {{IndexOrDocValuesQuery}} is rewritten
> to the wrapped point range query. However with {{AndTerm02Range25}} the range
> cost is higher than the term cost so the range is only used for verifying
> matches and the {{IndexOrDocValuesQuery}} rewrites to the wrapped doc values
> query, yielding a speedup since we do not have to evaluate the range against
> the whole index.
> I think the -2/-3% difference we are seeing for everything else than
> {{AndTerm02Range25}} is noisy since term queries execute exactly the same way
> in both cases, yet they have this slight slowdown too.
> I would like to make it easier to use by moving {{IndexOrDocValuesQuery}} and
> {{DocValuesRangeQuery}} to a different module than sandbox, and giving the
> doc values range query an API that is closer to point ranges by making the
> bounds required (null disallowed) and removing the {{includeLower}} and
> {{includeUpper}} parameters. I wanted to move to {{queries}} initially but
> maybe {{core}} is better, that way we could link from the point API to
> {{IndexOrDocValuesQuery}} as a way to make queries on fields that have both
> points and doc values more efficient.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]