zeroshade commented on issue #814: URL: https://github.com/apache/arrow-go/issues/814#issuecomment-4433068773
The ColumnChunkReader objects have `SeekToRow` methods which will utilize the PageIndex if it exists to do page-level pruning when seeking to a specific row index. There's also a TODO note in the pageindex reader where we could potentially implement prefetching of pages into a page cache for quicker/faster reading. Other than seeking to specific rows, we aren't yet automatically utilizing the Page, Column or Offset indexes in the reader. That said, if you're building something that processes queries on Parquet, you should be able to leverage the indexes to determine what rows to seek to (which will then leverage the index) programmatically. There's definitely room for us to add more routes and utility into the API to better utilize the indexes if they exist. But I will absolutely admit that right now, if you're not doing row seeking, you're not leveraging other engines, and you aren't going to use the indexes yourself to perform the seeking, then writing the indexes probably won't benefit you much YET. Eventually I plan on trying to improve the APIs to allow better usage of the indexes, but you're correct that it doesn't benefit performance outside of these small cases yet. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
