JigaoLuo commented on PR #8257:
URL: https://github.com/apache/arrow-rs/pull/8257#issuecomment-3266966160

   Hello everyone,
   
   I just came across this PR and noticed that most of the discussion is 
happening here, so I’d like to continue the conversation in this thread rather 
than on the issue page.
   
   I believe the direction of this PR aligns well with a previous issue we 
discussed in https://github.com/XiangpengHao/liquid-cache/issues/227. I’ve been 
working on my own `parquet-rewrite` tool that touches on similar ideas, 
particularly with the **score** metric—a kind of breakeven point to decide 
whether compression should be applied. The goal of this tool is to help the 
reader skip unnecessary compression that adds overhead without delivering 
meaningful size reduction, ultimately improving the reader's reading 
performance.
   
   Setting this **score** is quite tricky and empirical. For now, I’ve set it 
at 10%, mainly to catch cases where compression offers no size benefit at all. 
Here is an example:
   
   <img width="2415" height="660" alt="image" 
src="https://github.com/user-attachments/assets/0ab7438c-6516-46b0-bc17-e9c8b9b14273";
 />
   
   
   ---
   
   As a side note, I’ve also made some patches to Xiangpeng’s viewer tool, 
which I use to inspect my generated Parquet files. This has been instrumental 
in iterating on my reader implementation.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to