[ https://issues.apache.org/jira/browse/LUCENE-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881010#action_12881010 ]
Karthick Sankarachary commented on LUCENE-2348: ----------------------------------------------- Hi, All, Having run into this very issue in our platform, I decided to take a stab at addressing it by defining what is essentially a stateful type of filter (for details, please see LUCENE-2506). In my mind, the stateful filter affords an easy and intuitive way for filters such as the DuplicateFilter, to work seamlessly across (the potentially many) segments of the index. In a nutshell, I tweaked the DuplicateFilter such that it accepts a given term if and only if it does not already exist in its "memory". For details, please see the DedupingTermsEnum#accept method in the revised DuplicateFilter class attached here. Note that I took the liberty of incorporating the edge case shown above into the DuplicateFilter's test case, which is also attached in the patch. Regards, Karthick Sankarachary > DuplicateFilter incorrectly handles multiple calls to getDocIdSet for segment > readers > ------------------------------------------------------------------------------------- > > Key: LUCENE-2348 > URL: https://issues.apache.org/jira/browse/LUCENE-2348 > Project: Lucene - Java > Issue Type: Bug > Components: contrib/* > Affects Versions: 2.9.2 > Reporter: Trejkaz > Attachments: LUCENE-2348.patch > > > DuplicateFilter currently works by building a single doc ID set, without > taking into account that getDocIdSet() will be called once per segment and > only with each segment's local reader. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org