jorgecarleitao commented on pull request #68: URL: https://github.com/apache/arrow-datafusion/pull/68#issuecomment-848444957
The differences in performance stems from the compute of the group by. Average time to read a 8k row batch in parquet during TPCH 1 on my computer: * this PR: 500 us * master: 1808 us i.e. reading parquet is ~3.6x faster (single thread in both cases). These improvements are not used atm because there is a bottleneck somewhere else (likely group by). So, from the parquets' standpoint, this is imo a large improvement. I will now start touching the CPU, and remaining tests. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
