theosib-amazon commented on PR #968: URL: https://github.com/apache/parquet-mr/pull/968#issuecomment-1137610598
That batch reader in Presto reminds me of some of the experimental changes I made in Trino. I modified PrimitiveColumnReader to work out how many of each data item it needs to read from the data source and requests all of them at once in an array. This doubled the performance of some TPCDS queries. This is why I have array access methods planned for ParquetMR. (https://docs.google.com/document/d/1fBGpF_LgtfaeHnPD5CFEIpA2Ga_lTITmFdFIcO9Af-g/edit?usp=sharing) Requesting data in bulk saves a lot on function call overhead for each data item. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
