alamb opened a new issue, #13098: URL: https://github.com/apache/datafusion/issues/13098
### Is your feature request related to a problem or challenge? @mnorfolk03 added planning benchmark for more sophisticated queries here https://github.com/apache/datafusion/pull/13085 ❤️ The benchmarks are in https://github.com/apache/datafusion/blob/main/datafusion/core/benches/sql_planner.rs However, the planning benchmarks we have now don't reflect querying an actual data source such as parquet (they query an empty in-memory table) One thing that might be helpful to improve more would be adding a ParquetExec as well as queries that have sortedness to reflect more real world cases ### Describe the solution you'd like I would like some planning benchmarks equivalent of planning against tables like this (docs here): https://datafusion.apache.org/user-guide/sql/ddl.html#create-external-table ```sql CREATE EXTERNAL TABLE foo STORED AS PARQUET LOCATION '..' ``` ```sql CREATE EXTERNAL TABLE test ( c1 VARCHAR NOT NULL, c2 INT NOT NULL, c3 SMALLINT NOT NULL, c4 SMALLINT NOT NULL, c5 INT NOT NULL, c6 BIGINT NOT NULL, c7 SMALLINT NOT NULL, c8 INT NOT NULL, c9 BIGINT NOT NULL, c10 VARCHAR NOT NULL, c11 FLOAT NOT NULL, c12 DOUBLE NOT NULL, c13 VARCHAR NOT NULL ) STORED AS CSV WITH ORDER (c2 ASC, c5 + c8 DESC NULL FIRST) LOCATION '/path/to/aggregate_test_100.csv' OPTIONS ('has_header' 'true'); ``` ### Describe alternatives you've considered One possibility could be to add a benchmark for planning the clickbench queries: https://github.com/apache/datafusion/tree/main/benchmarks/queries/clickbench We could either use the smaller hits.parquet file here: https://github.com/apache/datafusion/blob/main/datafusion/core/tests/data/clickbench_hits_10.parquet ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org