Re: RLE Compressing bit vectors, just toughts

eks dev Sun, 05 Aug 2007 01:35:49 -0700


>In the latest Matcher patches I forgot to consider your implementation of
>a Matcher on OpenBitSet. 
If I remember well, Matcher based on BitSetIterator (another of Yonik's pearls 
:) is something to consider, my Matcher on OpenBitSet is no better than yours.



>As a sort of baseline for good compression you might try and use
>the GZipInputStream in java.util.gzip. I just saw that it even

I believe it is not going to work, java compression support is good for 
compressing files or larger compact blocks of bytes, but when you need 
something a bit more "exotic", eg.  random access like in this case, static 
Huffman (or any entropy coder), static dictionary for deflate... it fails. 

In this concrete case you would need for every single doc id list to initialize 
new deflater, what is IMHO prohibitively slow. (skip you mentioned is just 
scanning byte by byte without directly repositioning Stream)

it would work for really huge collections, where this constant time factor gets 
relatively small. 

Also, this way we are not going to reduce the number of VInt decoding  (by far 
the most popular method call in Lucene :) 

So far, my best bet would be to do it directly on index, every time you see  a 
lot of deltas  == 1, you try to encode them with some 0xFF(run length). On 
decoding, next() and skipTo() should not be all that complex (I hope). Must see 
how it can cooperate with skipping info.







      ___________________________________________________________
Yahoo! Answers - Got a question? Someone out there knows the answer. Try it
now.
http://uk.answers.yahoo.com/


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: RLE Compressing bit vectors, just toughts

Reply via email to