alamb opened a new pull request, #6904: URL: https://github.com/apache/arrow-datafusion/pull/6904
# Which issue does this PR close? Part of https://github.com/apache/arrow-datafusion/issues/6889 Closes https://github.com/apache/arrow-datafusion/pull/6800 Closes https://github.com/apache/arrow-datafusion/issues/4973 # Rationale for this change Much faster grouping performance and lower memory usage for large numbers of groups TODO: regenerate these numbers ``` -------------------- Benchmark tpch.json -------------------- ┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓ ┃ Query ┃ main_base ┃ alamb_hash_agg_spike ┃ Change ┃ ┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩ │ QQuery 1 │ 789.36ms │ 768.82ms │ no change │ │ QQuery 2 │ 292.62ms │ 219.58ms │ +1.33x faster │ │ QQuery 3 │ 408.23ms │ 388.36ms │ no change │ │ QQuery 4 │ 239.14ms │ 236.48ms │ no change │ │ QQuery 5 │ 512.51ms │ 516.96ms │ no change │ │ QQuery 6 │ 208.24ms │ 211.47ms │ no change │ │ QQuery 7 │ 869.70ms │ 896.97ms │ no change │ │ QQuery 8 │ 574.60ms │ 591.00ms │ no change │ │ QQuery 9 │ 893.77ms │ 908.34ms │ no change │ │ QQuery 10 │ 650.66ms │ 621.45ms │ no change │ │ QQuery 11 │ 204.09ms │ 178.99ms │ +1.14x faster │ │ QQuery 12 │ 334.17ms │ 327.36ms │ no change │ │ QQuery 13 │ 744.82ms │ 634.29ms │ +1.17x faster │ │ QQuery 14 │ 292.05ms │ 281.81ms │ no change │ │ QQuery 15 │ 247.06ms │ 218.11ms │ +1.13x faster │ │ QQuery 16 │ 247.45ms │ 209.87ms │ +1.18x faster │ │ QQuery 17 │ 2534.68ms │ 1135.75ms │ +2.23x faster │ │ QQuery 18 │ 2630.03ms │ 1751.31ms │ +1.50x faster │ │ QQuery 19 │ 521.75ms │ 528.30ms │ no change │ │ QQuery 20 │ 926.76ms │ 440.71ms │ +2.10x faster │ │ QQuery 21 │ 1278.07ms │ 1275.54ms │ no change │ │ QQuery 22 │ 150.15ms │ 150.67ms │ no change │ └──────────────┴───────────┴──────────────────────┴───────────────┘ -------------------- Benchmark tpch_mem.json -------------------- ┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓ ┃ Query ┃ main_base ┃ alamb_hash_agg_spike ┃ Change ┃ ┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩ │ QQuery 1 │ 489.23ms │ 455.08ms │ +1.08x faster │ │ QQuery 2 │ 243.33ms │ 134.34ms │ +1.81x faster │ │ QQuery 3 │ 166.61ms │ 158.30ms │ no change │ │ QQuery 4 │ 112.69ms │ 109.91ms │ no change │ │ QQuery 5 │ 371.31ms │ 367.26ms │ no change │ │ QQuery 6 │ 38.85ms │ 39.05ms │ no change │ │ QQuery 7 │ 857.14ms │ 848.70ms │ no change │ │ QQuery 8 │ 228.76ms │ 226.56ms │ no change │ │ QQuery 9 │ 525.80ms │ 507.89ms │ no change │ │ QQuery 10 │ 322.86ms │ 304.78ms │ +1.06x faster │ │ QQuery 11 │ 185.13ms │ 157.05ms │ +1.18x faster │ │ QQuery 12 │ 158.53ms │ 152.98ms │ no change │ │ QQuery 13 │ 511.26ms │ 254.26ms │ +2.01x faster │ │ QQuery 14 │ 44.26ms │ 43.50ms │ no change │ │ QQuery 15 │ 75.39ms │ 45.33ms │ +1.66x faster │ │ QQuery 16 │ 196.56ms │ 158.71ms │ +1.24x faster │ │ QQuery 17 │ 2260.88ms │ 788.95ms │ +2.87x faster │ │ QQuery 18 │ 2375.63ms │ 1416.96ms │ +1.68x faster │ │ QQuery 19 │ 158.64ms │ 150.11ms │ +1.06x faster │ │ QQuery 20 │ 830.32ms │ 305.56ms │ +2.72x faster │ │ QQuery 21 │ 995.44ms │ 978.06ms │ no change │ │ QQuery 22 │ 84.62ms │ 79.60ms │ +1.06x faster │ └──────────────┴───────────┴──────────────────────┴───────────────┘ ``` TODO: also figure out how to run the clickbench suite entirely # What changes are included in this PR? - [x] Rewrite `GroupedHashAggregateStream` to use vectorized / multi-group updates - [x] A new `GroupsAccumulator` trait with the new vectorized API for managing and updating group state - [x] An generic implementation of `GroupsAccumulator` for all aggregators that have `RowAccumulator` variants - [x] Fuzz testing of new code `accumulate` function - [x] An adapter that implements `GroupsAccumulator` in terms of `Accumulator` (for slower, but simpler accumulators) Here is the list of `RowAccumulator`s (aka accumulators that have specialized implementations). - [x] `CountRowAccumulator` - [x] `MaxRowAccumulator` - [x] `MinRowAccumulator` - [x] `AvgRowAccumulator` - [x] `SumRowAccumulator` - [x] `BitAndRowAccumulator` - [x] `BitOrRowAccumulator` - [x] `BitXorRowAccumulator` - [x] `BoolAndRowAccumulator` - [x] `BoolOrRowAccumulator` # Are these changes tested? Yes -- both new and existing tests # Are there any user-facing changes? Much faster performance -- see above -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
