ICDE) [arrow-datafusion]

via GitHub Thu, 30 Nov 2023 03:34:51 -0800


alamb commented on issue #6782:
URL: 
https://github.com/apache/arrow-datafusion/issues/6782#issuecomment-1833592424


   > > I found one issue in the 
[benchmarks](https://github.com/JayjeetAtGithub/datafusion-duckdb-benchmark).
   > 
   > Thank you for taking a look. It'd be great if double checked all the 
config parameters and our usage of APIs on both sides to make sure we don't 
accidentally distort any measurement.
   
   I did review this quite extensively when working on the benchmarks and I 
believe they are defensible. The runner scripts are based on the (duckdb 
authored) scripts to run ClickBench (that use `fetchall`):
   
   
https://github.com/ClickHouse/ClickBench/blob/1a8ecca8808378da011a3050d648cb9dbd2a1d95/duckdb-parquet/query.py#L15
   
   So while we can (and should) improve the scripts if the paper is accepted 
for publication, I recommend keeping the existing results that have `fetchall` 
for the draft because:
   1. we used the same methodology as experts in DuckDB
   2. the overall conclusion is the same, even if the some of the relative 
numbers are slightly different


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] Write DataFusion paper for (SIGMOD / VLDB / ICDE) [arrow-datafusion]

Reply via email to