[
https://issues.apache.org/jira/browse/LUCENE-4548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Adrien Grand resolved LUCENE-4548.
----------------------------------
Resolution: Invalid
This issue is not relevant anymore: live docs are now always applied on top of
queries.
> BooleanFilter should optionally pass down further restricted acceptDocs in
> the MUST case (and acceptDocs in general)
> --------------------------------------------------------------------------------------------------------------------
>
> Key: LUCENE-4548
> URL: https://issues.apache.org/jira/browse/LUCENE-4548
> Project: Lucene - Core
> Issue Type: Bug
> Reporter: Uwe Schindler
> Priority: Major
> Attachments: LUCENE-4548.patch
>
>
> Spin-off from dev@lao:
> {quote}
> bq. I am about to write a Filter that only operates on a set of documents
> that have already passed other filter(s). It's rather expensive, since it
> has to use DocValues to examine a value and then determine if its a match.
> So it scales O(n) where n is the number of documents it must see. The 2nd
> arg of getDocIdSet is Bits acceptDocs. Unfortunately Bits doesn't have an
> int iterator but I can deal with that seeing if it extends DocIdSet.
> bq. I'm looking at BooleanFilter which I want to use and I notice that it
> passes null to filter.getDocIdSet for acceptDocs, and it justifies this with
> the following comment:
> bq. // we dont pass acceptDocs, we will filter at the end using an additional
> filter
> the idea of passing the already build bits for the MUST is a good idea and
> can be implemented easily.
> The reason why the acceptDocs were not passed down is the new way of filter
> works in Lucene 4.0 and to optimize caching. Because accept docs are the only
> thing that changes when deletions are applied and filters are required to
> handle them separately: whenever something is able to cache (e.g.
> CachingWrapperFilter), the acceptDocs are not cached, so the underlying
> filters get a null acceptDocs to produce the full bitset and the filtering is
> done when CachingWrapperFilter gets the “uptodate” acceptDocs. But for this
> case this does not matter if the first filter clause does not get acceptdocs,
> but later MUST clauses of course can get them (they are not
> deletion-specific)!
> Can you open issue to optimize the MUST case (possibly MUST_NOT, too)?
> Another thing that could help here: You can stop using BooleanFilter if you
> can apply the filters sequentially (only MUST clauses) by wrapping with
> multiple FilteredQuery: new FilteredQuery(new FilteredQuery(originalQuery,
> clause1), clause2). If the DocIdSets enable bits() and the FilteredQuery
> autodetection decides to use random access filters, the acceptdocs are also
> passed down from the outside to the inner, removing the documents filtered
> out.
> {quote}
> Maybe BooleanFilter should have 2 modes (Boolean ctor argument): Passing down
> the acceptDocs to every filter (for the case where Filter calculation is
> expensive and accept docs help to limit the calculations) or not passing down
> (if the filter is cheap and the multiple acceptDocs bit checks for every
> single filter is more expensive – which is then more effective, e.g. when the
> Filter is only a cached bitset). The first mode would also optimize the
> MUST/MUST_NOT case to pass down the further restricted acceptDocs on later
> filters (just like FilteredQuery does).
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]