[GitHub] [arrow] westonpace opened a new pull request, #12967: ARROW-16294: [C++] Improve performance of parquet readahead

GitBox Fri, 22 Apr 2022 17:11:09 -0700


westonpace opened a new pull request, #12967:
URL: https://github.com/apache/arrow/pull/12967


    * Turns out the batch size we were slicing internally for parquet (in the 
TableBatchReader) was not the batch size from the scanner.  I added batch 
slicing in file_parquet to leave the parquet reader itself more or less 
unchanged here (and it's not clear it would make more sense to use the smaller 
batch size slicing inside the reader anyways).
    * Changed parquet readahead so it reads ahead more than one row group.  It 
now tries to keep `batch_size * batch_readahead` reads in flight.  Users that 
want more parallel reads can increase batch readahead.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] westonpace opened a new pull request, #12967: ARROW-16294: [C++] Improve performance of parquet readahead

Reply via email to