adriangb commented on code in PR #22814:
URL: https://github.com/apache/datafusion/pull/22814#discussion_r3380668941
##########
benchmarks/sql_benchmarks/sort_tpch/init/load.sql:
##########
@@ -0,0 +1,3 @@
+CREATE EXTERNAL TABLE lineitem_raw STORED AS PARQUET LOCATION
'${DATA_DIR:-data}/tpch_sf${BENCH_SIZE:-1}/lineitem/lineitem.1.parquet';
+
+CREATE TABLE lineitem as (SELECT * FROM lineitem_raw
${BENCH_SORTED:-false|order by l_orderkey asc| });
Review Comment:
```suggestion
CREATE TABLE lineitem as (SELECT * FROM
lineitem_raw${BENCH_SORTED:-false|order by l_orderkey asc | });
```
Would this work and avoid a double space? No big deal either way.
##########
benchmarks/sql_benchmarks/sort_tpch/benchmarks/q05.benchmark:
##########
@@ -0,0 +1,44 @@
+echo Loading tpch items sorted: ${BENCH_SORTED:-false}
+
+#
+# Sort queries with different characteristics:
+# - Sort key with fixed length or variable length (VARCHAR)
+# - Sort key with different cardinality
+# - Different number of sort keys
+# - Different number of payload columns (thin: 1 additional column other
+# than sort keys; wide: all columns except sort keys)
+#
+# DataSet is `lineitem` table in TPCH dataset (16 columns, 6M rows for
+# scale factor 1.0, cardinality is counted from SF1 dataset)
+#
+# Key Columns:
+# - Column `l_linenumber`, type: `INTEGER`, cardinality: 7
+# - Column `l_suppkey`, type: `BIGINT`, cardinality: 10k
+# - Column `l_orderkey`, type: `BIGINT`, cardinality: 1.5M
+# - Column `l_comment`, type: `VARCHAR`, cardinality: 4.5M (len is ~26 chars)
+#
+# Payload Columns:
+# - Thin variant: `l_partkey` column with `BIGINT` type (1 column)
+# - Wide variant: all columns except for possible key columns (12 columns)
+
+name Q05
+group sort_tpch
+subgroup sf${BENCH_SIZE:-1}
+
+echo Loading sort_tpch sf ${BENCH_SIZE:-1} data
+
+load sql_benchmarks/sort_tpch/init/load.sql
+
+assert I
+SELECT COUNT(*) > 0 from lineitem;
+----
+true
+
+run
+-- Q5: 3 sort keys {(INTEGER, 7), (BIGINT, 10k), (BIGINT, 1.5M)} + no payload
column
+SELECT l_linenumber, l_suppkey, l_orderkey
+FROM lineitem
+ORDER BY l_linenumber, l_suppkey, l_orderkey
+${LIMIT:-false|LIMIT 100| }
Review Comment:
Do we need to document these env vars?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]