[ https://issues.apache.org/jira/browse/LUCENE-4069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13286815#comment-13286815 ]
Mark Harwood commented on LUCENE-4069: -------------------------------------- bq. Its not worth the complexity There's no real added complexity in BloomFilterPostingsFormat - it has to be capable of storing blooms for >1 field anyway and using the fieldname set is roughly 2 extra lines of code to see if a TermsConsumer needs wrapping or not. >From a client side you don't have to use this feature - the fieldname set can >be null in which case it will wrap all fields sent its way. If you do chose to >supply a set the wrapped PostingsFormat will have the advantage of being >shared for bloomed and non-bloomed fields. We could add a constructor that >removes the set and mark the others "expert". For me this falls into one of the many faster-if-you-know-about-it optimisations like FieldSelectors or recycling certain objects. Basically a useful hint to Lucene to save some extra effort but one which you dont *need* to use. Lucene-4093 may in future resolve the multi-file issue but I'm not sure it will do so without significant complication. > Segment-level Bloom filters for a 2 x speed up on rare term searches > -------------------------------------------------------------------- > > Key: LUCENE-4069 > URL: https://issues.apache.org/jira/browse/LUCENE-4069 > Project: Lucene - Java > Issue Type: Improvement > Components: core/index > Affects Versions: 3.6, 4.0 > Reporter: Mark Harwood > Priority: Minor > Fix For: 4.0, 3.6.1 > > Attachments: BloomFilterPostings40.patch, > MHBloomFilterOn3.6Branch.patch, PrimaryKey40PerformanceTestSrc.zip > > > An addition to each segment which stores a Bloom filter for selected fields > in order to give fast-fail to term searches, helping avoid wasted disk access. > Best suited for low-frequency fields e.g. primary keys on big indexes with > many segments but also speeds up general searching in my tests. > Overview slideshow here: > http://www.slideshare.net/MarkHarwood/lucene-bloomfilteredsegments > Benchmarks based on Wikipedia content here: http://goo.gl/X7QqU > Patch based on 3.6 codebase attached. > There are no 3.6 API changes currently - to play just add a field with "_blm" > on the end of the name to invoke special indexing/querying capability. > Clearly a new Field or schema declaration(!) would need adding to APIs to > configure the service properly. > Also, a patch for Lucene4.0 codebase introducing a new PostingsFormat -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org