Re: [I] Question regarding Parquet Page Index: Why enable it during write if it's not utilized during read? [arrow-go]

via GitHub Tue, 12 May 2026 10:26:41 -0700


zeroshade commented on issue #814:
URL: https://github.com/apache/arrow-go/issues/814#issuecomment-4433068773


   The ColumnChunkReader objects have `SeekToRow` methods which will utilize 
the PageIndex if it exists to do page-level pruning when seeking to a specific 
row index. 
   
   There's also a TODO note in the pageindex reader where we could potentially 
implement prefetching of pages into a page cache for quicker/faster reading. 
   
   Other than seeking to specific rows, we aren't yet automatically utilizing 
the Page, Column or Offset indexes in the reader. That said, if you're building 
something that processes queries on Parquet, you should be able to leverage the 
indexes to determine what rows to seek to (which will then leverage the index) 
programmatically. 
   
   There's definitely room for us to add more routes and utility into the API 
to better utilize the indexes if they exist. But I will absolutely admit that 
right now, if you're not doing row seeking, you're not leveraging other 
engines, and you aren't going to use the indexes yourself to perform the 
seeking, then writing the indexes probably won't benefit you much YET. 
Eventually I plan on trying to improve the APIs to allow better usage of the 
indexes, but you're correct that it doesn't benefit performance outside of 
these small cases yet.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] Question regarding Parquet Page Index: Why enable it during write if it's not utilized during read? [arrow-go]

Reply via email to