Hi, I found this paper recently published in proceedings of CIKM'10:
VSEncoding: efficient coding and fast decoding of integer lists via dynamic programming, F. Silvestri, R. Venturini Available here http://portal.acm.org/citation.cfm?id=1871437.1871592 My favorite quotes: "We point out that only our methods, together with In- terpolative, are able to beat the entropy of the lists on the datasets. This quasi-paradoxical eect is, indeed, present because entropy does not consider context information. En- tropy, or to use a notation commonly used in text compres- sion, zeroth-order entropy, does not take into account pat- terns (i.e. the context) that can be present in lists of blocks of integers. By grouping together blocks of integers, in fact, we are able to assign codewords to more than a single value at a time. Therefore, it appears obvious that we can beat the entropy in the case of VSE, VSE-R and Interpolative. Es- sentially, this is possible since we exploit regularities on the lists on these very skewed d-gaps lists (e.g., small values close to each other or quite long runs of 1s). We remark that beat the entropy is not possible with any prex code (e.g., sta- tistical compressors like Arithmetic and Human or integer encoders like , , 's, Golomb, and so on). Therefore, our methods is certainly better in compression than any of these kind of methods without the need of any comparison. Any- way, for the sake of completeness, we report those results as well in Table 3." "As expected, Interpolative is the slowest as opposed to VSE which tops others with more than 800 millions of integers per second. Our methods, VSE and VSE-R, are among the fastest in decoding with a number of mis (millions of integers per second) decoded ranging from 450 of VSE-R to 835 of VSE both of them measured using the gov2 collection. It is interesting to observe the better performance in terms of decoding speed of VSE with respect to others, and in particular with respect to OPT-P4D, Simple9 and Simple16 which are considered state-of-the-art as far as decompression speed is concerned." P4D compression achieved a speed of 460 mis in this test, which means the VSE method is twice as fast. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __________________________________ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org