JigaoLuo commented on PR #8257: URL: https://github.com/apache/arrow-rs/pull/8257#issuecomment-3266966160
Hello everyone, I just came across this PR and noticed that most of the discussion is happening here, so I’d like to continue the conversation in this thread rather than on the issue page. I believe the direction of this PR aligns well with a previous issue we discussed in https://github.com/XiangpengHao/liquid-cache/issues/227. I’ve been working on my own `parquet-rewrite` tool that touches on similar ideas, particularly with the **score** metric—a kind of breakeven point to decide whether compression should be applied. The goal of this tool is to help the reader skip unnecessary compression that adds overhead without delivering meaningful size reduction, ultimately improving the reader's reading performance. Setting this **score** is quite tricky and empirical. For now, I’ve set it at 10%, mainly to catch cases where compression offers no size benefit at all. Here is an example: <img width="2415" height="660" alt="image" src="https://github.com/user-attachments/assets/0ab7438c-6516-46b0-bc17-e9c8b9b14273" /> --- As a side note, I’ve also made some patches to Xiangpeng’s viewer tool, which I use to inspect my generated Parquet files. This has been instrumental in iterating on my reader implementation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org