alamb commented on issue #6782: URL: https://github.com/apache/arrow-datafusion/issues/6782#issuecomment-1833592424
> > I found one issue in the [benchmarks](https://github.com/JayjeetAtGithub/datafusion-duckdb-benchmark). > > Thank you for taking a look. It'd be great if double checked all the config parameters and our usage of APIs on both sides to make sure we don't accidentally distort any measurement. I did review this quite extensively when working on the benchmarks and I believe they are defensible. The runner scripts are based on the (duckdb authored) scripts to run ClickBench (that use `fetchall`): https://github.com/ClickHouse/ClickBench/blob/1a8ecca8808378da011a3050d648cb9dbd2a1d95/duckdb-parquet/query.py#L15 So while we can (and should) improve the scripts if the paper is accepted for publication, I recommend keeping the existing results that have `fetchall` for the draft because: 1. we used the same methodology as experts in DuckDB 2. the overall conclusion is the same, even if the some of the relative numbers are slightly different -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
