[
https://issues.apache.org/jira/browse/LUCENE-5424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13886963#comment-13886963
]
David Smiley commented on LUCENE-5424:
--------------------------------------
I know I commented on LUCENE-5418 and then immediately created this issue but
these are not particularly related. I totally recognize that
RANDOM_ACCESS_FILTER_STRATEGY is for the typical case of fast filters. And
indeed I observed the TODO comment and thought, _hey_, DISI *does* have a
{{cost()}} now -- lets do this! Now there's this JIRA issue :-)
Not sure how to arrive at the right tuning ratio between the cost() of both
DISI's. Maybe use the benchmark module and try various filters that match 1%,
2%, etc. up to 99% of the documents, against some simple query that always
matches the same 50 % of the total docs? And then test this method given
configurable threshold ratios of query_cost/filter_cost of 10%, 20%, ... etc.
and see where the inflection point is. That's complicated, yeah.
> FilteredQuery useRandomAccess() should use cost()
> -------------------------------------------------
>
> Key: LUCENE-5424
> URL: https://issues.apache.org/jira/browse/LUCENE-5424
> Project: Lucene - Core
> Issue Type: Improvement
> Components: core/query/scoring
> Reporter: David Smiley
>
> Now that Lucene's DISI has a cost() method, it's possible for FilteredQuery's
> RANDOM_ACCESS_FILTER_STRATEGY to use a smarter algorithm in its
> useRandomAccess() method. In particular, it might examine filterIter.cost()
> to see if it is greater than the cost returned by weight.scorer().cost() of
> the query.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]