[jira] Commented: (LUCENE-2348) DuplicateFilter incorrectly handles multiple calls to getDocIdSet for segment readers

Michael McCandless (JIRA) Fri, 26 Nov 2010 02:54:44 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12935970#action_12935970
 ]


Michael McCandless commented on LUCENE-2348:
--------------------------------------------

bq. if its not doing the work in getdocidset, it shouldn't extend Filter!

I don't think we can or should dictate that.

I think it's fair game for a Filter to compute/cache whatever it
wants.  The only requirement for Filter is that it implement
getDocIdSet.  Where it does its work, what it's storing in its
instance, etc., is up to it.

Sure, we strive for a strong separation of "computing the bits" vs
"caching them", but for some cases that ideal is not feasible.

In fact in this case the filter is so costly to build that no
realistic app can possibly rely on the filter without first wrapping
it in CachingWrapperFilter.  So I see no harm in conflating caching
with this.  We could rename it to CachingDuplicateFilter.  In fact we
could factor out the FilterCache utility class now inside
CachingWrapperFilter and make it easily reused by other filters like
this one that need to compute & cache right off.

This would also be cleaner if we change the filter API so getDocIdSet
receives the top reader and docBase in addition to the sub; this way a
CachingDuplicateFilter instance could be reused across reopened top
readers.

{quote}
If someone wants to make a "DuplicateBitSetBuilder" that is a factory for 
creating a BitSet,
to me that is more natural and obvious as to what is going on.
{quote}

That sounds good... but how would it work?  Ie how would an app tie
that into a Filter?


> DuplicateFilter incorrectly handles multiple calls to getDocIdSet for segment 
> readers
> -------------------------------------------------------------------------------------
>
>                 Key: LUCENE-2348
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2348
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/*
>    Affects Versions: 2.9.2
>            Reporter: Trejkaz
>         Attachments: LUCENE-2348.patch, LUCENE-2348.patch
>
>
> DuplicateFilter currently works by building a single doc ID set, without 
> taking into account that getDocIdSet() will be called once per segment and 
> only with each segment's local reader.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2348) DuplicateFilter incorrectly handles multiple calls to getDocIdSet for segment readers

Reply via email to