alamb commented on issue #18909: URL: https://github.com/apache/datafusion/issues/18909#issuecomment-3572854475
# Background The datafusion clickbench scripts build datafusion-cli like this: https://github.com/ClickHouse/ClickBench/tree/main/datafusion-partitioned ```shell CARGO_PROFILE_RELEASE_LTO=true RUSTFLAGS="-C codegen-units=1" cargo build --release --package datafusion-cli --bin datafusion-cli ``` Then it clears the filesystem cache like this: ```shell echo 3 | sudo tee /proc/sys/vm/drop_caches >/dev/null ``` Then it runs the queries like this: ```shell ./datafusion/target/release/datafusion-cli -f create.sql -f /tmp/query.sql ``` For example when running q0, the scripts look like this: `create.sql`: ```sql CREATE EXTERNAL TABLE hits STORED AS PARQUET LOCATION 'partitioned' OPTIONS ('binary_as_string' 'true'); ``` `/tmp/query.sql` ```sql SELECT COUNT(*) FROM hits; ``` Implications: 1. Each query is run "cold" in the sense that datafusion-cli is started fresh and there is no metadata cache from previous runs. See notes below for how we could improve this if we wanted -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
