alamb commented on pull request #1526:
URL:
https://github.com/apache/arrow-datafusion/pull/1526#issuecomment-1012442236
Here are the results of my comparison: the `simple_mm` branch appears to be
about 10% faster for reasons I don't understand
Setup:
1. 10G TPCH data (Scale Factor 10)
2. 16 core / 64G mem machine in google cloud ("Cascade Lake" architecture)
3. Ran q1 which is a basic select / predicate / orderby (query below)
Benchmark command:
```shell
cd benchmarks
cargo run --release --bin tpch -- benchmark datafusion --partitions 16 -m
--iterations 10 --path /data/tpch_data_10G/ --format tbl --query 1
```
## `master`
Compared master at 14176ffb50307b1d550729c8334658293e057f87
(arrow-datafusion) (merge base of `simple_mm`)
```
Query 1 iteration 0 took 550.8 ms
Query 1 iteration 1 took 542.1 ms
Query 1 iteration 2 took 533.0 ms
Query 1 iteration 3 took 539.4 ms
Query 1 iteration 4 took 543.0 ms
Query 1 iteration 5 took 538.5 ms
Query 1 iteration 6 took 537.9 ms
Query 1 iteration 7 took 536.6 ms
Query 1 iteration 8 took 537.5 ms
Query 1 iteration 9 took 539.8 ms
Query 1 avg time: 539.86 ms
```
## `yjshen/simple_mm`
yjshen/simple_mm at 04dca98e8d2cdf433d1959a015d0569fd6e3d0c3
```
Query 1 iteration 0 took 500.2 ms
Query 1 iteration 1 took 492.2 ms
Query 1 iteration 2 took 489.1 ms
Query 1 iteration 3 took 488.4 ms
Query 1 iteration 4 took 489.2 ms
Query 1 iteration 5 took 485.6 ms
Query 1 iteration 6 took 488.2 ms
Query 1 iteration 7 took 489.5 ms
Query 1 iteration 8 took 491.4 ms
Query 1 iteration 9 took 489.7 ms
Query 1 avg time: 490.36 ms
```
## Query
Query 1
```sql
select
l_returnflag,
l_linestatus,
sum(l_quantity) as sum_qty,
sum(l_extendedprice) as sum_base_price,
sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
avg(l_quantity) as avg_qty,
avg(l_extendedprice) as avg_price,
avg(l_discount) as avg_disc,
count(*) as count_order
from
lineitem
where
l_shipdate <= date '1998-09-02'
group by
l_returnflag,
l_linestatus
order by
l_returnflag,
l_linestatus;
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]