>In the latest Matcher patches I forgot to consider your implementation of >a Matcher on OpenBitSet. If I remember well, Matcher based on BitSetIterator (another of Yonik's pearls :) is something to consider, my Matcher on OpenBitSet is no better than yours.
>As a sort of baseline for good compression you might try and use >the GZipInputStream in java.util.gzip. I just saw that it even I believe it is not going to work, java compression support is good for compressing files or larger compact blocks of bytes, but when you need something a bit more "exotic", eg. random access like in this case, static Huffman (or any entropy coder), static dictionary for deflate... it fails. In this concrete case you would need for every single doc id list to initialize new deflater, what is IMHO prohibitively slow. (skip you mentioned is just scanning byte by byte without directly repositioning Stream) it would work for really huge collections, where this constant time factor gets relatively small. Also, this way we are not going to reduce the number of VInt decoding (by far the most popular method call in Lucene :) So far, my best bet would be to do it directly on index, every time you see a lot of deltas == 1, you try to encode them with some 0xFF(run length). On decoding, next() and skipTo() should not be all that complex (I hope). Must see how it can cooperate with skipping info. ___________________________________________________________ Yahoo! Answers - Got a question? Someone out there knows the answer. Try it now. http://uk.answers.yahoo.com/ --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]