Dandandan commented on pull request #68: URL: https://github.com/apache/arrow-datafusion/pull/68#issuecomment-922246917
Rerunning the benchmarks with `-p 16` now. Some profiling revealed option was added recently to have a maximum partitioning, which defaults to 2 in the benchmark. | Query | Arrow | Arrow2 | |----------|:-------------:|------:| | 1 | 1409.01 | 1355.47 | | 3 | 1045.94 | 1047.49 | | 5 | 1848.22 | 1847.41 | | 6 | 403.26 | 450.15 | | 10 | 1639.10 | 1886.21 | | 12 | 2508.39 | 2596.48 | | 13 | 2441.99 | 2296.75 | Looks to be closer now than when only using 2 threads to load the data. Skipping parquet (loading parquet is also slower when using 16 threads vs 2: 2708ms in arrow2 vs 2192 in master) | Query | Arrow | Arrow2 | |----------|:-------------:|------:| | 1 | 725.57 | 739.89 | | 3 | 507.22 | 501.26 | | 5 | 1352.82 | 1328.22 | | 6 | 109.01 | 97.44 | | 10 | 1102.23 | 1301.48 | | 12 | 266.91 | 283.56 | | 13 | 2114.55 | 2278.81 | -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
