Re: [I] Parquet LevelEncoder is much too eager to write short rle runs [arrow-rs]

via GitHub Wed, 25 Jun 2025 02:16:25 -0700


JigaoLuo commented on issue #7739:
URL: https://github.com/apache/arrow-rs/issues/7739#issuecomment-3004007076


   Thanks for having me, and I really appreciate this insight from 
@XiangpengHao .
   > But we should consider decoding speed as well. unpack bit-packed data is 
slow when bit-pack sequence is short. So if we want faster decoding, we should 
make RLE less likely, i.e., make n larger.
   
   It basically highlights the core dilemma we face with CPU-parquet-reader: 
the most highly compressed Parquet files are often not the fastest to read (to 
decode and decompress). We’ve even seen cases where this tradeoff makes 
uncompressed Parquet -- despite its larger size -- faster for end-to-end 
reading.
   
   I am always wishing for writing Parquet files **optimized for read 
performance**. However, identifying such low-level pitfalls is challenging, so 
thank you for shedding light on this!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] Parquet LevelEncoder is much too eager to write short rle runs [arrow-rs]

Reply via email to