martin-g commented on code in PR #19035:
URL: https://github.com/apache/datafusion/pull/19035#discussion_r2583986975
##########
benchmarks/bench.sh:
##########
@@ -548,20 +544,19 @@ data_tpch() {
echo "Internal error: Scale factor not specified"
exit 1
fi
+ FORMAT=$2
Review Comment:
See https://github.com/apache/datafusion/pull/19035#discussion_r2579864175
There are two calls of `data_tpch` there which do not pass the format.
https://github.com/apache/datafusion/pull/19035/files/907bce3e16352148eade3b7cf512091a9aab4232#diff-1769f5787dc11c8b1f1b48288cdf3c89d25a5b5cbc6be4740bfcc70a6313ba99R550
will print `Creating tpch <EMPTY> dataset at Scale Factor`, where `<EMPTY>` is
an empty string.
And the third reason why I proposed `parquet` as default is:
```
Also @comphead pointed out on
https://github.com/apache/datafusion/pull/19034#pullrequestreview-3526952491
that the bench.sh data tpch generated both csv and parquet files when it only
really needs parquet.
```
This sounds like parquet is the needed format most of the time.
But data_h2o() uses CSV as a default format:
https://github.com/alamb/datafusion/blob/907bce3e16352148eade3b7cf512091a9aab4232/benchmarks/bench.sh#L853
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]