westonpace opened a new pull request, #12967:
URL: https://github.com/apache/arrow/pull/12967
* Turns out the batch size we were slicing internally for parquet (in the
TableBatchReader) was not the batch size from the scanner. I added batch
slicing in file_parquet to leave the parquet reader itself more or less
unchanged here (and it's not clear it would make more sense to use the smaller
batch size slicing inside the reader anyways).
* Changed parquet readahead so it reads ahead more than one row group. It
now tries to keep `batch_size * batch_readahead` reads in flight. Users that
want more parallel reads can increase batch readahead.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]