zhuqi-lucas commented on PR #18817:
URL: https://github.com/apache/datafusion/pull/18817#issuecomment-3605978882
I got the performance result for row group reverse with topk dynamic
pushdown:
```rust
Running `/Users/zhuqi/arrow-datafusion/target/release/dfbench
clickbench --iterations 5 --path
/Users/zhuqi/arrow-datafusion/benchmarks/data/hits_0_sorted.parquet
--queries-path
/Users/zhuqi/arrow-datafusion/benchmarks/queries/clickbench/queries/sorted_data
--sorted-by EventTime --sort-order ASC -o
/Users/zhuqi/arrow-datafusion/benchmarks/results/issue_19059/data_sorted_clickbench.json`
Running benchmarks with the following options: RunOpt { query: None,
pushdown: false, common: CommonOpt { iterations: 5, partitions: None,
batch_size: None, mem_pool_type: "fair", memory_limit: None,
sort_spill_reservation_bytes: None, debug: false }, path:
"/Users/zhuqi/arrow-datafusion/benchmarks/data/hits_0_sorted.parquet",
queries_path:
"/Users/zhuqi/arrow-datafusion/benchmarks/queries/clickbench/queries/sorted_data",
output_path:
Some("/Users/zhuqi/arrow-datafusion/benchmarks/results/issue_19059/data_sorted_clickbench.json"),
sorted_by: Some("EventTime"), sort_order: "ASC" }
⚠️ Forcing target_partitions=1 to preserve sort order
⚠️ (Because we want to get the pure performance benefit of sorted data to
compare)
📊 Session config target_partitions: 1
Registering table with sort order: EventTime ASC
Executing: CREATE EXTERNAL TABLE hits STORED AS PARQUET LOCATION
'/Users/zhuqi/arrow-datafusion/benchmarks/data/hits_0_sorted.parquet' WITH
ORDER ("EventTime" ASC)
Q0: -- Must set for ClickBench hits_partitioned dataset. See
https://github.com/apache/datafusion/issues/16591
-- set datafusion.execution.parquet.binary_as_string = true
SELECT * FROM hits ORDER BY "EventTime" DESC limit 10;
Query 0 iteration 0 took 21.5 ms and returned 10 rows
Query 0 iteration 1 took 12.7 ms and returned 10 rows
Query 0 iteration 2 took 11.1 ms and returned 10 rows
Query 0 iteration 3 took 10.6 ms and returned 10 rows
Query 0 iteration 4 took 9.9 ms and returned 10 rows
Query 0 avg time: 13.17 ms
+ set +x
Done
```
So it's very close to reverse parquet implementation:
The main branch result is:
300ms
The reverse parquet implementation this PR is:
9.8ms
The reverse row group with dynamic topk is
https://github.com/apache/datafusion/pull/19064:
13ms
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]