alamb opened a new pull request, #6904:
URL: https://github.com/apache/arrow-datafusion/pull/6904

   # Which issue does this PR close?
   
   Part of https://github.com/apache/arrow-datafusion/issues/6889
   Closes https://github.com/apache/arrow-datafusion/pull/6800 
   Closes https://github.com/apache/arrow-datafusion/issues/4973
   
   
   # Rationale for this change
   
   Much faster grouping performance and lower memory usage for large numbers of 
groups
   
   TODO: regenerate these numbers
   
   ```
   --------------------
   Benchmark tpch.json
   --------------------
   ┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
   ┃ Query        ┃ main_base ┃ alamb_hash_agg_spike ┃        Change ┃
   ┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
   │ QQuery 1     │  789.36ms │             768.82ms │     no change │
   │ QQuery 2     │  292.62ms │             219.58ms │ +1.33x faster │
   │ QQuery 3     │  408.23ms │             388.36ms │     no change │
   │ QQuery 4     │  239.14ms │             236.48ms │     no change │
   │ QQuery 5     │  512.51ms │             516.96ms │     no change │
   │ QQuery 6     │  208.24ms │             211.47ms │     no change │
   │ QQuery 7     │  869.70ms │             896.97ms │     no change │
   │ QQuery 8     │  574.60ms │             591.00ms │     no change │
   │ QQuery 9     │  893.77ms │             908.34ms │     no change │
   │ QQuery 10    │  650.66ms │             621.45ms │     no change │
   │ QQuery 11    │  204.09ms │             178.99ms │ +1.14x faster │
   │ QQuery 12    │  334.17ms │             327.36ms │     no change │
   │ QQuery 13    │  744.82ms │             634.29ms │ +1.17x faster │
   │ QQuery 14    │  292.05ms │             281.81ms │     no change │
   │ QQuery 15    │  247.06ms │             218.11ms │ +1.13x faster │
   │ QQuery 16    │  247.45ms │             209.87ms │ +1.18x faster │
   │ QQuery 17    │ 2534.68ms │            1135.75ms │ +2.23x faster │
   │ QQuery 18    │ 2630.03ms │            1751.31ms │ +1.50x faster │
   │ QQuery 19    │  521.75ms │             528.30ms │     no change │
   │ QQuery 20    │  926.76ms │             440.71ms │ +2.10x faster │
   │ QQuery 21    │ 1278.07ms │            1275.54ms │     no change │
   │ QQuery 22    │  150.15ms │             150.67ms │     no change │
   └──────────────┴───────────┴──────────────────────┴───────────────┘
   --------------------
   Benchmark tpch_mem.json
   --------------------
   ┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
   ┃ Query        ┃ main_base ┃ alamb_hash_agg_spike ┃        Change ┃
   ┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
   │ QQuery 1     │  489.23ms │             455.08ms │ +1.08x faster │
   │ QQuery 2     │  243.33ms │             134.34ms │ +1.81x faster │
   │ QQuery 3     │  166.61ms │             158.30ms │     no change │
   │ QQuery 4     │  112.69ms │             109.91ms │     no change │
   │ QQuery 5     │  371.31ms │             367.26ms │     no change │
   │ QQuery 6     │   38.85ms │              39.05ms │     no change │
   │ QQuery 7     │  857.14ms │             848.70ms │     no change │
   │ QQuery 8     │  228.76ms │             226.56ms │     no change │
   │ QQuery 9     │  525.80ms │             507.89ms │     no change │
   │ QQuery 10    │  322.86ms │             304.78ms │ +1.06x faster │
   │ QQuery 11    │  185.13ms │             157.05ms │ +1.18x faster │
   │ QQuery 12    │  158.53ms │             152.98ms │     no change │
   │ QQuery 13    │  511.26ms │             254.26ms │ +2.01x faster │
   │ QQuery 14    │   44.26ms │              43.50ms │     no change │
   │ QQuery 15    │   75.39ms │              45.33ms │ +1.66x faster │
   │ QQuery 16    │  196.56ms │             158.71ms │ +1.24x faster │
   │ QQuery 17    │ 2260.88ms │             788.95ms │ +2.87x faster │
   │ QQuery 18    │ 2375.63ms │            1416.96ms │ +1.68x faster │
   │ QQuery 19    │  158.64ms │             150.11ms │ +1.06x faster │
   │ QQuery 20    │  830.32ms │             305.56ms │ +2.72x faster │
   │ QQuery 21    │  995.44ms │             978.06ms │     no change │
   │ QQuery 22    │   84.62ms │              79.60ms │ +1.06x faster │
   └──────────────┴───────────┴──────────────────────┴───────────────┘
   ```
   
   TODO: also figure out how to run the clickbench suite entirely
   
   # What changes are included in this PR?
   - [x] Rewrite `GroupedHashAggregateStream` to use vectorized / multi-group 
updates
   - [x] A new `GroupsAccumulator` trait with the new vectorized API for 
managing and updating group state
   - [x] An generic implementation of `GroupsAccumulator` for all aggregators 
that have `RowAccumulator` variants
   - [x] Fuzz testing of new code `accumulate` function
   - [x]  An adapter that implements `GroupsAccumulator` in terms of 
`Accumulator` (for slower, but simpler accumulators)
   
   Here is the list of `RowAccumulator`s (aka accumulators that have 
specialized implementations). 
   
   - [x] `CountRowAccumulator`
   - [x] `MaxRowAccumulator`
   - [x] `MinRowAccumulator`
   - [x] `AvgRowAccumulator`
   - [x] `SumRowAccumulator`
   - [x] `BitAndRowAccumulator`
   - [x] `BitOrRowAccumulator`
   - [x] `BitXorRowAccumulator`
   - [x] `BoolAndRowAccumulator`
   - [x] `BoolOrRowAccumulator`
   
   
   # Are these changes tested?
   Yes -- both new and existing tests 
   
   # Are there any user-facing changes?
   Much faster performance -- see above
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to