[ https://issues.apache.org/jira/browse/LUCENE-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12935970#action_12935970 ]
Michael McCandless commented on LUCENE-2348: -------------------------------------------- bq. if its not doing the work in getdocidset, it shouldn't extend Filter! I don't think we can or should dictate that. I think it's fair game for a Filter to compute/cache whatever it wants. The only requirement for Filter is that it implement getDocIdSet. Where it does its work, what it's storing in its instance, etc., is up to it. Sure, we strive for a strong separation of "computing the bits" vs "caching them", but for some cases that ideal is not feasible. In fact in this case the filter is so costly to build that no realistic app can possibly rely on the filter without first wrapping it in CachingWrapperFilter. So I see no harm in conflating caching with this. We could rename it to CachingDuplicateFilter. In fact we could factor out the FilterCache utility class now inside CachingWrapperFilter and make it easily reused by other filters like this one that need to compute & cache right off. This would also be cleaner if we change the filter API so getDocIdSet receives the top reader and docBase in addition to the sub; this way a CachingDuplicateFilter instance could be reused across reopened top readers. {quote} If someone wants to make a "DuplicateBitSetBuilder" that is a factory for creating a BitSet, to me that is more natural and obvious as to what is going on. {quote} That sounds good... but how would it work? Ie how would an app tie that into a Filter? > DuplicateFilter incorrectly handles multiple calls to getDocIdSet for segment > readers > ------------------------------------------------------------------------------------- > > Key: LUCENE-2348 > URL: https://issues.apache.org/jira/browse/LUCENE-2348 > Project: Lucene - Java > Issue Type: Bug > Components: contrib/* > Affects Versions: 2.9.2 > Reporter: Trejkaz > Attachments: LUCENE-2348.patch, LUCENE-2348.patch > > > DuplicateFilter currently works by building a single doc ID set, without > taking into account that getDocIdSet() will be called once per segment and > only with each segment's local reader. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org