[I] [Parquet] Support page level cache for reading [arrow-rs]

via GitHub Fri, 29 Aug 2025 02:45:18 -0700


123789456ye opened a new issue, #8246:
URL: https://github.com/apache/arrow-rs/issues/8246


   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   Previously in parquet, we should read a whold RowGroup into memory and then 
extract what we need. This is obviously wasted. 
   Therefore, I thought of to only read the page we need, and cache the pages 
for future read. 
   The previous part is solved thanks to #7850 , and I begin to work after this 
pr released.
   
   **Describe the solution you'd like**
   I thought of adding a cache mechanism into `decode_page` in `impl 
RowGroupReader for SerializedRowGroupReader`. In this way we can avoid some 
decode and decompress cost.
   
   **Describe alternatives you've considered**
   I have considered to also add cache to filter stage, but this part is 
already implemented.
   I have also considered about page level prefetch, but I think it may be not 
so profitable.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] [Parquet] Support page level cache for reading [arrow-rs]

Reply via email to