[GitHub] [parquet-mr] theosib-amazon commented on pull request #968: PARQUET-2149: Async IO implementation for ParquetFileReader

GitBox Wed, 25 May 2022 10:31:25 -0700


theosib-amazon commented on PR #968:
URL: https://github.com/apache/parquet-mr/pull/968#issuecomment-1137610598


   That batch reader in Presto reminds me of some of the experimental changes I 
made in Trino. I modified PrimitiveColumnReader to work out how many of each 
data item it needs to read from the data source and requests all of them at 
once in an array. This doubled the performance of some TPCDS queries. This is 
why I have array access methods planned for ParquetMR. 
(https://docs.google.com/document/d/1fBGpF_LgtfaeHnPD5CFEIpA2Ga_lTITmFdFIcO9Af-g/edit?usp=sharing)
 Requesting data in bulk saves a lot on function call overhead for each data 
item.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [parquet-mr] theosib-amazon commented on pull request #968: PARQUET-2149: Async IO implementation for ParquetFileReader

Reply via email to