[ https://issues.apache.org/jira/browse/HBASE-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13088266#comment-13088266 ]
Matt Corgan commented on HBASE-4218: ------------------------------------ I lean towards byte-encoding ints whenever they're used often enough to have an impact on memory. KeyValue could probably do better with some VInts. You can encode 128 values in 1 byte and decode it with just one branch to check if b[0] < 0. Given the number of other byte comparisons going during reading the key, that doesn't seem too heavyweight (especially since many of those other byte comparisons are casting the byte to a positive integer before comparing). If you reserved 2-4 bytes for that same number, then you may be doing even more work. One problem with VInt decoders is that sometimes they do bounds checking which can slow things down a lot. I think validation should be done at write time, and then possibly using a block-level checksum when a block is copied back into memory. Then assume everything is correct. For prefix compression, we're talking about encoding things at the block level where most of the ints are internal pointers that are less than the block size of 64k, so most ints can fit in 2 bytes. But it's important that they be able to grow gracefully when block sizes grow beyond 64k or are configured to be bigger. I've been using two types of encoded integers: VInt and FInt. FInts are basically an optimization over VInts for cases where you have many ints with the same characteristics, and can therefore store their width at the block level rather than encoding it in every occurrence. VInt (variable width int) * width is not known ahead of time, so must interpret byte-by-byte * slower because of branch on each byte, but still pretty fast * only 2^7 values/byte, so 2 bytes can hold 16k values FInt (fixed width int) * width is known ahead of time and stored externally (at block level in PtBlockMeta in this project) * an FInt is faster to encode decode because of the lack of if-statements * each byte can store 2^8 values, so 2 bytes gets you 64k values (hbase block size) * a list of these numbers provides random access. important for binary searching * if encoding the numbers 0-10,000, for example, then VInts will save you 1 byte on the numbers 0-255, but that is a small % savings. so use FInts for lists of numbers ----------------- Sidenote: I've been meaning to make a CVInt (comparable variable width int) that: * sorts based on raw bytes even if different widths (good for suffixing hbase row/colQualifier values) * to interpret, count the number of leading 1 bits, and that is how many additional bytes there are beyond the first byte * bits beyond the first 0 bit comprise the value * should also be faster to decode because of fewer branches > Delta Encoding of KeyValues (aka prefix compression) > ----------------------------------------------------- > > Key: HBASE-4218 > URL: https://issues.apache.org/jira/browse/HBASE-4218 > Project: HBase > Issue Type: Improvement > Components: io > Reporter: Jacek Migdal > Labels: compression > > A compression for keys. Keys are sorted in HFile and they are usually very > similar. Because of that, it is possible to design better compression than > general purpose algorithms, > It is an additional step designed to be used in memory. It aims to save > memory in cache as well as speeding seeks within HFileBlocks. It should > improve performance a lot, if key lengths are larger than value lengths. For > example, it makes a lot of sense to use it when value is a counter. > Initial tests on real data (key length = ~ 90 bytes , value length = 8 bytes) > shows that I could achieve decent level of compression: > key compression ratio: 92% > total compression ratio: 85% > LZO on the same data: 85% > LZO after delta encoding: 91% > While having much better performance (20-80% faster decompression ratio than > LZO). Moreover, it should allow far more efficient seeking which should > improve performance a bit. > It seems that a simple compression algorithms are good enough. Most of the > savings are due to prefix compression, int128 encoding, timestamp diffs and > bitfields to avoid duplication. That way, comparisons of compressed data can > be much faster than a byte comparator (thanks to prefix compression and > bitfields). > In order to implement it in HBase two important changes in design will be > needed: > -solidify interface to HFileBlock / HFileReader Scanner to provide seeking > and iterating; access to uncompressed buffer in HFileBlock will have bad > performance > -extend comparators to support comparison assuming that N first bytes are > equal (or some fields are equal) > Link to a discussion about something similar: > http://search-hadoop.com/m/5aqGXJEnaD1/hbase+windows&subj=Re+prefix+compression -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira