Eks,

> 
>     [ 
> https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12762742#action_12762742
>  ] 
> 
> Eks Dev commented on LUCENE-1410:
> ---------------------------------
> 
> Mike, 
> That is definitely the way to go, distribution dependent encoding, where 
> every Term gets individual treatment.
>   
> Take for an example simple, but not all that rare case where Index gets 
> sorted on some of the indexed fields (we use it really extensively, e.g. 
> presorted doc collection on user_rights/zip/city, all indexed). There you get 
> perfectly "compressible"  postings by simply managing intervals of set bits. 
> Updates distort this picture, but we rebuild index periodically and all gets 
> good again.  At the moment we load them into RAM as Filters in IntervalSets. 
> if that would be possible in lucene, we wouldn't bother with Filters (VInt 
> decoding on such super dense fields was killing us, even in RAMDirectory) ... 
>  

You could try switching the Filter to OpenBitSet when that takes fewer bytes 
than SortedVIntList.

Regards,
Paul Elschot

Reply via email to