[ 
https://issues.apache.org/jira/browse/LUCENE-5424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13886963#comment-13886963
 ] 

David Smiley commented on LUCENE-5424:
--------------------------------------

I know I commented on LUCENE-5418 and then immediately created this issue but 
these are not particularly related.  I totally recognize that 
RANDOM_ACCESS_FILTER_STRATEGY is for the typical case of fast filters.  And 
indeed I observed the TODO comment and thought, _hey_, DISI *does* have a 
{{cost()}} now -- lets do this!  Now there's this JIRA issue :-)

Not sure how to arrive at the right tuning ratio between the cost() of both 
DISI's.  Maybe use the benchmark module and try various filters that match 1%, 
2%, etc. up to 99% of the documents, against some simple query that always 
matches the same 50 % of the total docs?  And then test this method given 
configurable threshold ratios of query_cost/filter_cost of 10%, 20%, ... etc. 
and see where the inflection point is.  That's complicated, yeah.  

> FilteredQuery useRandomAccess() should use cost()
> -------------------------------------------------
>
>                 Key: LUCENE-5424
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5424
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/query/scoring
>            Reporter: David Smiley
>
> Now that Lucene's DISI has a cost() method, it's possible for FilteredQuery's 
> RANDOM_ACCESS_FILTER_STRATEGY to use a smarter algorithm in its 
> useRandomAccess() method.  In particular, it might examine filterIter.cost() 
> to see if it is greater than the cost returned by weight.scorer().cost() of 
> the query.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to