[ 
https://issues.apache.org/jira/browse/ARROW-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17247647#comment-17247647
 ] 

Andy Grove commented on ARROW-10453:
------------------------------------

I re-ran TPC-H 1 query 1, starting from the same commit that I reported in this 
PR just to be sure I am getting consistent results:
{code:java}
Running benchmarks with the following options: TpchOpt { query: 1, debug: 
false, iterations: 3, concurrency: 24, batch_size: 4096, path: 
"/mnt/tpch/parquet-100GB", file_format: "parquet", mem_table: false }
Query 1 iteration 0 took 18858 ms
Query 1 iteration 1 took 18508 ms
Query 1 iteration 2 took 18430 ms
{code}
Next, I ran with commit in master just *before* #8842 was merged 
(a774ae7f3b83dd3127cf40709f9ac9d5c8d98e25)
{code:java}
Running benchmarks with the following options: BenchmarkOpt { query: 1, debug: 
false, iterations: 3, concurrency: 24, batch_size: 4096, path: 
"/mnt/tpch/parquet-100GB", file_format: "parquet", mem_table: false }
Query 1 iteration 0 took 13935 ms
Query 1 iteration 1 took 13926 ms
Query 1 iteration 2 took 13895 ms {code}
There have been a lot of optimizations lately and they are clearly paying off!

Finally, I ran with latest from master 
(2816f37ff01cfd31101b3ee6cc8574cc9246dd1b)
{code:java}
Running benchmarks with the following options: BenchmarkOpt { query: 1, debug: 
false, iterations: 3, concurrency: 24, batch_size: 4096, path: 
"/mnt/tpch/parquet-100GB", file_format: "parquet", mem_table: false }
Query 1 iteration 0 took 10722 ms
Query 1 iteration 1 took 10305 ms
Query 1 iteration 2 took 10146 ms {code}
This is the best that performance has ever been :)

This is really great to see. Thank you [~jorgecarleitao] and everyone else who 
contributed to this ([~Dandandan], [~jhorstmann], [~alamb])!

> [Rust] [DataFusion] Performance degredation after removing specialization
> -------------------------------------------------------------------------
>
>                 Key: ARROW-10453
>                 URL: https://issues.apache.org/jira/browse/ARROW-10453
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Rust, Rust - DataFusion
>    Affects Versions: 3.0.0
>            Reporter: Andy Grove
>            Priority: Major
>             Fix For: 3.0.0
>
>
> The following commit caused a pretty large drop in performance for the TPC-H 
> benchmark running against a SF=100 data set.
> {code:java}
>  29e9d13481ea6acc3f74cda108ed34ef8a411ba2 is the first bad commit
> commit 29e9d13481ea6acc3f74cda108ed34ef8a411ba2
> Author: Jorge C. Leitao <[email protected]>
> Date:   Sun Oct 18 21:05:48 2020 +0200    ARROW-10002: [Rust] Remove trait 
> specialization from arrow crate
>     
>     This PR removes trait specialization by leveraging the compiler to remove 
> trivial `if` statements.
>     
>     I verified that the assembly code was the same in a [simple 
> example](https://rust.godbolt.org/z/qrcW8W). I do not know if this 
> generalizes to our use-case, but I suspect so as LLVM is (hopefully) removing 
> trivial branches like `if a != a`.
>     
>     The change `get_data_type()` to `DATA_TYPE` is not necessary. I did it 
> before realizing this. IMO it makes it more explicit that this is not a 
> function, but a constant, but we can revert it.
>     
>     Closes #8485 from jorgecarleitao/simp_types
>     
>     Authored-by: Jorge C. Leitao <[email protected]>
>     Signed-off-by: Neville Dipale <[email protected]>:040000 040000 
> cbdaf3c9e924ec0e51d178df73169956b2bf723f 
> 87c79e17378196b61dce9c5373e008ee94620d58 M   rust
> {code}
> Benchmark command:
> {code:java}
>  cargo run --release --bin tpch -- --iterations 3 --path 
> /mnt/tpch/parquet-100GB --format parquet --query 1 --batch-size 4096 
> --concurrency 24{code}
> Before this commit:
> {code:java}
> Query 1 iteration 0 took 13629 ms
> Query 1 iteration 1 took 13450 ms
> Query 1 iteration 2 took 13465 ms {code}
> After this commit:
> {code:java}
> Query 1 iteration 0 took 18586 ms
> Query 1 iteration 1 took 18297 ms
> Query 1 iteration 2 took 18253 ms {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to