XiangpengHao commented on issue #7739:
URL: https://github.com/apache/arrow-rs/issues/7739#issuecomment-3002789761

   > When would it be bad to use the maximum RLE value rather than bit packing?
   
   by default it will use the maximum RLE value.
   
   The choice here is, for example, the encoder see only 7 consecutive same 
value, should it use RLE or bit packing?
   
   The back of envelope calculation goes:
   
   1. RLE needs 1 byte header and 1 byte actual value, so every RLE needs 2 
bytes.
   2. Bit packing needs 1 byte header and n * bit_width/8 bytes.
   
   If bit_width=1, then any sequence longer than 8 should use RLE.
   if bit_width=2, then anything longer than 4 should use RLE. 
   
   But we should consider decoding speed as well. unpack bit-packed data is 
slow when bit-pack sequence is short. So if we want  faster decoding, we should 
make RLE less likely, i.e., make n larger.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to