2010YOUY01 commented on issue #6782:
URL: 
https://github.com/apache/arrow-datafusion/issues/6782#issuecomment-1729908213

   > Script to generate data, run experiments, and plot results (also plots) 
can be found 
[here](https://github.com/JayjeetAtGithub/datafusion-duckdb-benchmark/tree/scaling-benchmark).
   
   Thank you for your script! @JayjeetAtGithub 
   I tried to reproduce the h2o benchmark and found the measured durations 
include the time for formatting and printing the results (queries in h2o 
benchmark can have 10k~1M rows of output)
   * datafusion-cli time measurement should be fixed by 
https://github.com/apache/arrow-datafusion/pull/7617
   * Measured duckdb query execution time also seems to include printing, the 
numbers I get are larger than the numbers from duckdb cli using same number of 
threads
   
   (btw the current h2o benchmark results in the chart are csv+single core, 
right? Should we use parquet+available cores)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to