mustafasrepo commented on PR #6034:
URL: 
https://github.com/apache/arrow-datafusion/pull/6034#issuecomment-1521511462

   Regarding the performance downgrade. I have examined the code to see where 
performance downgrade occurs. In new implementation we extend `GroupState`, to 
be able to determine whether a group is prunable or not, and to prune it safely.
   Specifically, we were adding `ordered_columns: Vec<ScalarValue>`, to store 
ordered section of the group(When this change group is finalized, and can be 
pruned). This API required us to create empty Vector for each group (even if 
they are not prunable in theory. This may downgrade performance in high 
cardinality cases.) I have changed the API of this member from 
`ordered_columns: Vec<ScalarValue>` to  `ordered_columns: 
Option<Vec<ScalarValue>>` to prevent unnecessary empty vector creation. Below 
are my test results.
   
   | Query | main | branch | Change|
   |--------  |--------  | -------- |-------- |
   | QQuery 1     |    1887.95ms |    1896.02ms | no change |
   | QQuery 2     |     347.76ms |     349.14ms | no change |
   | QQuery 3     |    1473.75ms |    1475.31ms | no change |
   | QQuery 4     |    1442.71ms |    1436.79ms | no change |
   | QQuery 5     |    1497.76ms |    1501.52ms | no change |
   | QQuery 6     |    1171.16ms |    1174.17ms | no change |
   | QQuery 7     |    1801.61ms |    1801.15ms | no change |
   | QQuery 8     |    1500.70ms |    1503.70ms | no change |
   | QQuery 9     |    1771.67ms |    1771.04ms | no change |
   | QQuery 10    |    1464.79ms |    1467.14ms | no change |
   | QQuery 11    |     348.81ms |     349.58ms | no change |
   | QQuery 12    |    1623.71ms |    1613.43ms | no change |
   | QQuery 13    |     640.64ms |     643.67ms | no change |
   | QQuery 14    |    1225.25ms |    1228.61ms | no change |
   | QQuery 15    |    2350.93ms |    2362.08ms | no change |
   | QQuery 16    |     224.96ms |     230.69ms | no change |
   | QQuery 17    |    3580.82ms |    3584.29ms | no change |
   | QQuery 18    |    3396.31ms |    3425.83ms | no change |
   | QQuery 19    |    1401.42ms |    1405.69ms | no change |
   | QQuery 20    |    1543.74ms |    1576.21ms | no change |
   | QQuery 21    |    4267.18ms |    4258.72ms | no change |
   | QQuery 22    |     341.99ms |     343.75ms | no change |
   
   However, I cannot say definitely this has fixed the problem. 
   I think we have 3 options
   1. Extend `GroupedHashAggregateStream` to support streaming aggregation 
(current approach).
   2. Create a new kind of `Stream` for streaming aggregation (use it from 
`AggregateExec`).
   3. Create a new kind of `Executor` such `StreamingAggregateExec` for 
streaming use cases.
   
   My judgement is as follows for each case
   Pros of first approach:
   - Introduces least amount of change, utilizes a lot of common code.
   
   Cons of first approach
   - Possibly downgrade performance
   - Extends state with members that are not used in all cases
   
   Pros of 2nd approach
   - No performance penalty
   - Clear state
   - We can still produce non-pipeline breaking results given that there is an 
existing ordering for group by expressions
   
   Pros of 3rd approach
   - No performance penalty
   - Clear state
   
   Cons of 3rd approach
   - We can only produce non-pipeline breaking results when source is marked as 
unbounded. (For bounded cases even if there is an existing ordering that can 
produce non-pipeline breaking result; without an additional rule to change 
executor, we cannot produce non-pipeline breaking results.)
   
   We can pursue any one of the above approaches. If community has a preference 
we can pursue that approach.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to