JigaoLuo commented on issue #7739: URL: https://github.com/apache/arrow-rs/issues/7739#issuecomment-3004007076
Thanks for having me, and I really appreciate this insight from @XiangpengHao . > But we should consider decoding speed as well. unpack bit-packed data is slow when bit-pack sequence is short. So if we want faster decoding, we should make RLE less likely, i.e., make n larger. It basically highlights the core dilemma we face with CPU-parquet-reader: the most highly compressed Parquet files are often not the fastest to read (to decode and decompress). We’ve even seen cases where this tradeoff makes uncompressed Parquet -- despite its larger size -- faster for end-to-end reading. I am always wishing for writing Parquet files **optimized for read performance**. However, identifying such low-level pitfalls is challenging, so thank you for shedding light on this! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
