[GitHub] [arrow-datafusion] Dandandan commented on pull request #68: Experimenting with arrow2

GitBox Sat, 18 Sep 2021 02:19:46 -0700


Dandandan commented on pull request #68:
URL: https://github.com/apache/arrow-datafusion/pull/68#issuecomment-922246917



   Rerunning the benchmarks with `-p 16` now. Some profiling revealed option 
was added recently to have a maximum partitioning, which defaults to 2 in the 
benchmark.
   
   | Query   |      Arrow      |  Arrow2 |
   |----------|:-------------:|------:|
   | 1 | 1409.01  | 1355.47 |
   | 3 | 1045.94 |  1047.49 |
   | 5 | 1848.22 | 1847.41 |
   | 6 | 403.26 | 450.15 |
   | 10 | 1639.10 | 1886.21 |
   | 12 | 2508.39 | 2596.48 |
   | 13 | 2441.99 | 2296.75 |
   
   Looks to be closer now than when only using 2 threads to load the data.
   
   Skipping parquet (loading parquet is also slower when using 16 threads vs 2: 
2708ms in arrow2 vs 2192 in master)
   
   | Query   |      Arrow      |  Arrow2 |
   |----------|:-------------:|------:|
   | 1 | 725.57  | 739.89 |
   | 3 | 507.22 |  501.26 |
   | 5 | 1352.82 | 1328.22 |
   | 6 | 109.01 | 97.44 |
   | 10 | 1102.23 | 1301.48 |
   | 12 | 266.91 | 283.56 |
   | 13 | 2114.55 | 2278.81 |


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] Dandandan commented on pull request #68: Experimenting with arrow2

Reply via email to