[GitHub] [arrow-datafusion] jorgecarleitao commented on pull request #68: Experimenting with arrow2

GitBox Tue, 25 May 2021 21:31:52 -0700


jorgecarleitao commented on pull request #68:
URL: https://github.com/apache/arrow-datafusion/pull/68#issuecomment-848444957



   The differences in performance stems from the compute of the group by. 
Average time to read a 8k row batch in parquet during TPCH 1 on my computer:
   
   * this PR: 500 us
   * master: 1808 us
   
   i.e. reading parquet is ~3.6x faster (single thread in both cases).
   
   These improvements are not used atm because there is a bottleneck somewhere 
else (likely group by).
   
   So, from the parquets' standpoint, this is imo a large improvement. I will 
now start touching the CPU, and remaining tests.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] jorgecarleitao commented on pull request #68: Experimenting with arrow2

Reply via email to