alamb commented on code in PR #14129:
URL: https://github.com/apache/datafusion/pull/14129#discussion_r1920568890
##########
benchmarks/bench.sh:
##########
@@ -401,9 +405,14 @@ data_clickbench_1() {
else
URL="https://datasets.clickhouse.com/hits_compatible/hits.parquet"
echo -n "... downloading ${URL} (14GB) ... "
- wget --continue ${URL}
- fi
- echo " Done"
+ if ! wget --continue ${URL}; then
Review Comment:
I think the check above tests for the file size and already detects partial
/ failed previous downloads
```shell
if test "${OUTPUT_SIZE}" = "14779976446"; then
```
So I am not sure this is necessary
##########
benchmarks/bench.sh:
##########
@@ -455,13 +464,27 @@ run_clickbench_extended() {
$CARGO_COMMAND --bin dfbench -- clickbench --iterations 5 --path
"${DATA_DIR}/hits.parquet" --queries-path
"${SCRIPT_DIR}/queries/clickbench/extended.sql" -o "${RESULTS_FILE}"
}
+# Add cleanup function at the start of script
+cleanup_download() {
+ local file_to_clean="$1"
+ echo -e "\nCleaning up downloaded files..."
+ rm -f "${file_to_clean}"
+ exit 1
+}
+
+# Add trap before download starts
+trap cleanup_download INT TERM
+
# Downloads the csv.gz files IMDB datasets from Peter Boncz's homepage(one of
the JOB paper authors)
# https://event.cwi.nl/da/job/imdb.tgz
data_imdb() {
local imdb_dir="${DATA_DIR}/imdb"
local imdb_temp_gz="${imdb_dir}/imdb.tgz"
local imdb_url="https://event.cwi.nl/da/job/imdb.tgz"
+ # Set trap with parameter
Review Comment:
Perhaps we can add a check for file size of the imdb.tgz file rather than
just checking for its existence
```
if [ ! -f "${imdb_dir}/imdb.tgz" ]; then
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]