2010YOUY01 commented on issue #6782: URL: https://github.com/apache/arrow-datafusion/issues/6782#issuecomment-1729908213
> Script to generate data, run experiments, and plot results (also plots) can be found [here](https://github.com/JayjeetAtGithub/datafusion-duckdb-benchmark/tree/scaling-benchmark). Thank you for your script! @JayjeetAtGithub I tried to reproduce the h2o benchmark and found the measured durations include the time for formatting and printing the results (queries in h2o benchmark can have 10k~1M rows of output) * datafusion-cli time measurement should be fixed by https://github.com/apache/arrow-datafusion/pull/7617 * Measured duckdb query execution time also seems to include printing, the numbers I get are larger than the numbers from duckdb cli using same number of threads (btw the current h2o benchmark results in the chart are csv+single core, right? Should we use parquet+available cores) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
