GitHub user zhuqi-lucas added a comment to the discussion: how to run tpch benchmark datafusion
It isn’t a bug in DataFusion so much as in how the TPCH benchmark runner expects your data laid out. By default it will look under your --path for one directory per table (named exactly after the table), and then inside that directory expect one or more Parquet files. What you have today is a flat directory of files: ```rust /par/tpch/sf4-parquet/ ├─ customer.parquet ├─ lineitem.parquet ├─ nation.parquet ├─ orders.parquet ├─ part.parquet ├─ partsupp.parquet ├─ region.parquet └─ supplier.parquet ``` When it tries to read table part it literally does a list() on /par/tpch/sf4-parquet/part (i.e. a directory), which doesn’t exist, hence the “NotFound … path: …/part” error. A easy way to fix it: ```rust cd /par/tpch/sf4-parquet for tbl in customer lineitem nation orders part partsupp region supplier; do mkdir -p "$tbl" mv "${tbl}.parquet" "$tbl/" done ``` Or you can using datafusion command to generate the tpch data: https://github.com/apache/datafusion/blob/main/benchmarks/README.md ```rust ./bench.sh data tpch ``` GitHub link: https://github.com/apache/datafusion/discussions/16598#discussioncomment-13603469 ---- This is an automatically sent email for github@datafusion.apache.org. To unsubscribe, please send an email to: github-unsubscr...@datafusion.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org