[ 
https://issues.apache.org/jira/browse/LUCENE-6077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14226284#comment-14226284
 ] 

Robert Muir commented on LUCENE-6077:
-------------------------------------

This looks great!

Do we really need to default CachingWrapperFilter to a "stupid" policy?
Is there a better name for FilterCache.cache() method? it can be a noun or a 
verb, so its kind of confusing. Maybe doCache would be better?
CachingWrapperFilter's new ctor: can we fix the typo?
FilterCachingPolicy.onCache, can we correct the param name?

> Add a filter cache
> ------------------
>
>                 Key: LUCENE-6077
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6077
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Assignee: Adrien Grand
>            Priority: Minor
>             Fix For: 5.0
>
>         Attachments: LUCENE-6077.patch
>
>
> Lucene already has filter caching abilities through CachingWrapperFilter, but 
> CachingWrapperFilter requires you to know which filters you want to cache 
> up-front.
> Caching filters is not trivial. If you cache too aggressively, then you slow 
> things down since you need to iterate over all documents that match the 
> filter in order to load it into an in-memory cacheable DocIdSet. On the other 
> hand, if you don't cache at all, you are potentially missing interesting 
> speed-ups on frequently-used filters.
> Something that would be nice would be to have a generic filter cache that 
> would track usage for individual filters and make the decision to cache or 
> not a filter on a given segments based on usage statistics and various 
> heuristics, such as:
>  - the overhead to cache the filter (for instance some filters produce 
> DocIdSets that are already cacheable)
>  - the cost to build the DocIdSet (the getDocIdSet method is very expensive 
> on some filters such as MultiTermQueryWrapperFilter that potentially need to 
> merge lots of postings lists)
>  - the segment we are searching on (flush segments will likely be merged 
> right away so it's probably not worth building a cache on such segments)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to