logan-keede commented on issue #17261:
URL: https://github.com/apache/datafusion/issues/17261#issuecomment-3505310653
After some testing to narrow the cause of the difference between benchmark
and datafusion-cli, I found out that the cause is the fact that `UInt64`
somehow does not get optimised by `eliminate_nested_union` optimiser(which is
being used in benchmarks), after changing it to `Int64` it seems to work fine.
It reduces the time by about half for 50 columns and 100 columns scenario.
```
Int64
Benchmarking physical_sorted_union_order_by_50: Collecting 100 samples in
estimated 11.081 s (physical_sorted_union_order_by_50
time: [105.39 ms 105.57 ms 105.78 ms]
change: [−1.4254% −0.8674% −0.3407%] (p = 0.00 <
0.05)
Change within noise threshold.
Found 5 outliers among 100 measurements (5.00%)
3 (3.00%) high mild
2 (2.00%) high severe
UInt64
Benchmarking physical_sorted_union_order_by_50: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase
target time to 25.7s, or reduce sample count to 10.
Benchmarking physical_sorted_union_order_by_50: Collecting 100 samples in
estimated 25.740 s (physical_sorted_union_order_by_50
time: [252.76 ms 254.23 ms 255.73 ms]
change: [+139.23% +140.81% +142.27%] (p = 0.00 <
0.05)
Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) high mild
1 (1.00%) high severe
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]