XiangpengHao commented on issue #7739: URL: https://github.com/apache/arrow-rs/issues/7739#issuecomment-3002789761
> When would it be bad to use the maximum RLE value rather than bit packing? by default it will use the maximum RLE value. The choice here is, for example, the encoder see only 7 consecutive same value, should it use RLE or bit packing? The back of envelope calculation goes: 1. RLE needs 1 byte header and 1 byte actual value, so every RLE needs 2 bytes. 2. Bit packing needs 1 byte header and n * bit_width/8 bytes. If bit_width=1, then any sequence longer than 8 should use RLE. if bit_width=2, then anything longer than 4 should use RLE. But we should consider decoding speed as well. unpack bit-packed data is slow when bit-pack sequence is short. So if we want faster decoding, we should make RLE less likely, i.e., make n larger. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
