yjshen edited a comment on pull request #1596:
URL:
https://github.com/apache/arrow-datafusion/pull/1596#issuecomment-1015165921
## 1. [bench] `sort_limit_query_sql`
```
cargo criterion --bench sort_limit_query_sql
```
No noticeable difference between this branch with which it
[originates](https://github.com/apache/arrow-datafusion/tree/438b41749c5cf9db68e431557b3bea01aec74af9):
W/ this PR:
```
sort_and_limit_by_int time: [3.3633 ms 3.4736 ms 3.6268 ms]
sort_and_limit_by_float time: [3.3644 ms 3.4726 ms 3.6342 ms]
sort_and_limit_lex_by_int
time: [3.7431 ms 4.1084 ms 4.6626 ms]
sort_and_limit_lex_by_string
time: [3.4665 ms 3.6071 ms 3.7919 ms]
```
W/o this PR:
```
sort_and_limit_by_int time: [3.3156 ms 3.3392 ms 3.3626 ms]
sort_and_limit_by_float time: [3.2272 ms 3.6257 ms 4.3373 ms]
sort_and_limit_lex_by_int
time: [3.4235 ms 3.4393 ms 3.4558 ms]
sort_and_limit_lex_by_string
time: [3.3962 ms 3.4127 ms 3.4298 ms]
```
## 2 TPC-H sf=1 sort_extendedprice_discount
**Three times slower** for the external_sort compared to the previous sort.
```sh
./target/release/tpch benchmark datafusion --path ./data --format tbl
--query 1 --batch-size 10240 --partitions 1
```
I changed q1 locally to run directly from tpch program to:
```sql
select
l_returnflag,
l_linestatus,
l_quantity,
l_extendedprice,
l_discount,
l_tax
from
lineitem
order by
l_extendedprice,
l_discount;
```
query plan:
```
SortExec: [l_extendedprice@3 ASC NULLS LAST,l_discount@4 ASC NULLS LAST]
ProjectionExec: expr=[l_returnflag@4 as l_returnflag, l_linestatus@5 as
l_linestatus, l_quantity@0 as l_quantity, l_extendedprice@1 as l_extendedprice,
l_discount@2 as l_discount, l_tax@3 as l_tax]
CsvExec: files=[./data/lineitem.tbl], has_header=false, limit=None
```
W/ this PR:
```
Running benchmarks with the following options: DataFusionBenchmarkOpt {
query: 1, debug: false, iterations: 3, partitions: 1, batch_size: 10240, path:
"./data", file_format: "tbl", mem_table: false }
Query 1 iteration 0 took 35683.2 ms
Query 1 iteration 1 took 32783.6 ms
Query 1 iteration 2 took 32709.3 ms
Query 1 avg time: 33725.36 ms
```
W/o this PR:
```
Running benchmarks with the following options: DataFusionBenchmarkOpt {
query: 1, debug: false, iterations: 3, partitions: 1, batch_size: 10240, path:
"./data", file_format: "tbl", mem_table: false }
Query 1 iteration 0 took 8675.6 ms
Query 1 iteration 1 took 7833.2 ms
Query 1 iteration 2 took 8046.0 ms
Query 1 avg time: 8184.96 ms
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]