Re: [PR] Experimental parquet decoder with first-class selection pushdown support [arrow-rs]

via GitHub Fri, 31 Jan 2025 07:00:13 -0800


XiangpengHao commented on PR #6921:
URL: https://github.com/apache/arrow-rs/pull/6921#issuecomment-2627551092


   > I think it has the possibility to cache the decoded pages needed for the 
entire row group
   
   To clarify, it will only cache up to 2 pages per column: 
https://github.com/apache/arrow-rs/pull/6921/files#diff-e32cd78c497a3b6a5e49e47d1f7e44590071042201e5bb2c3c20de1c734ff6e5R321-R322
   
   > if this will affect memory usage during query
   It will, as we need to fetch the projected columns to memory before applying 
filters. We may see a bit higher memory usage because of that.
   
   > the next steps for this PR?
   This PR comes from a caching related research project, which is currently 
being heavily measured for various performance metrics: CPU/memory usage etc. 
Those performance numbers will definitely help us understand better about the 
trade-offs of this PR. My plan is to push this further after we submit the 
paper (hopefully before March). 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Experimental parquet decoder with first-class selection pushdown support [arrow-rs]

Reply via email to