Eks, > > [ > https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12762742#action_12762742 > ] > > Eks Dev commented on LUCENE-1410: > --------------------------------- > > Mike, > That is definitely the way to go, distribution dependent encoding, where > every Term gets individual treatment. > > Take for an example simple, but not all that rare case where Index gets > sorted on some of the indexed fields (we use it really extensively, e.g. > presorted doc collection on user_rights/zip/city, all indexed). There you get > perfectly "compressible" postings by simply managing intervals of set bits. > Updates distort this picture, but we rebuild index periodically and all gets > good again. At the moment we load them into RAM as Filters in IntervalSets. > if that would be possible in lucene, we wouldn't bother with Filters (VInt > decoding on such super dense fields was killing us, even in RAMDirectory) ... >
You could try switching the Filter to OpenBitSet when that takes fewer bytes than SortedVIntList. Regards, Paul Elschot