alamb commented on code in PR #13103: URL: https://github.com/apache/datafusion/pull/13103#discussion_r1816775576
########## datafusion/core/benches/sql_planner.rs: ########## @@ -299,6 +329,49 @@ fn criterion_benchmark(c: &mut Criterion) { } }) }); + + // -- clickbench -- + + let queries_file = + File::open("../../benchmarks/queries/clickbench/queries.sql").unwrap(); + let extended_file = + File::open("../../benchmarks/queries/clickbench/extended.sql").unwrap(); + + let clickbench_queries: Vec<String> = BufReader::new(queries_file) + .lines() + .chain(BufReader::new(extended_file).lines()) + .map(|l| l.expect("Could not parse line")) + .collect_vec(); + + let clickbench_ctx = register_clickbench_hits_table(); + + for (i, sql) in clickbench_queries.iter().enumerate() { Review Comment: Since the physical planing benchmark *also* includes the logical planning and most usecases include both logican and physical planning, I think the logical planning only benchmarks are largely redundant however, I realize this PR just follows the existing pattern. Maybe we should remove all the "logical planning" benchmarks 🤔 ########## datafusion/core/benches/sql_planner.rs: ########## @@ -15,22 +15,28 @@ // specific language governing permissions and limitations // under the License. +extern crate arrow; #[macro_use] extern crate criterion; -extern crate arrow; extern crate datafusion; mod data_utils; + use crate::criterion::Criterion; use arrow::datatypes::{DataType, Field, Fields, Schema}; use datafusion::datasource::MemTable; use datafusion::execution::context::SessionContext; +use itertools::Itertools; +use std::fs::File; +use std::io::{BufRead, BufReader}; use std::sync::Arc; use test_utils::tpcds::tpcds_schemas; use test_utils::tpch::tpch_schemas; use test_utils::TableDef; use tokio::runtime::Runtime; +const CLICKBENCH_DATA_PATH: &str = "../../benchmarks/data/hits_partitioned/"; Review Comment: I think this assumes the script is run from `datafusion/core` (what cargo does) However, that meant when I tried to run the benchmark binary directly it failed like this: ``` (venv) andrewlamb@Andrews-MacBook-Pro-2:~/Software/datafusion$ target/release/deps/sql_planner-d64e21551189f776 --bench physical_plan_clickbench_q1 Gnuplot not found, using plotters backend thread 'main' panicked at datafusion/core/benches/sql_planner.rs:121:38: benchmarks/data/hits_partitioned/ could not be loaded. Please run 'benchmarks/bench.sh data clickbench_partitioned' prior to running this benchmark: Os { code: 2, kind: NotFound, message: "No such file or directory" } ``` Any chance you could make the script test both locations - `../../benchmarks/data/hits_partitioned` - `benchmarks/data/hits_partitioned`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org