mustafasrepo commented on PR #6034: URL: https://github.com/apache/arrow-datafusion/pull/6034#issuecomment-1521511462
Regarding the performance downgrade. I have examined the code to see where performance downgrade occurs. In new implementation we extend `GroupState`, to be able to determine whether a group is prunable or not, and to prune it safely. Specifically, we were adding `ordered_columns: Vec<ScalarValue>`, to store ordered section of the group(When this change group is finalized, and can be pruned). This API required us to create empty Vector for each group (even if they are not prunable in theory. This may downgrade performance in high cardinality cases.) I have changed the API of this member from `ordered_columns: Vec<ScalarValue>` to `ordered_columns: Option<Vec<ScalarValue>>` to prevent unnecessary empty vector creation. Below are my test results. | Query | main | branch | Change| |-------- |-------- | -------- |-------- | | QQuery 1 | 1887.95ms | 1896.02ms | no change | | QQuery 2 | 347.76ms | 349.14ms | no change | | QQuery 3 | 1473.75ms | 1475.31ms | no change | | QQuery 4 | 1442.71ms | 1436.79ms | no change | | QQuery 5 | 1497.76ms | 1501.52ms | no change | | QQuery 6 | 1171.16ms | 1174.17ms | no change | | QQuery 7 | 1801.61ms | 1801.15ms | no change | | QQuery 8 | 1500.70ms | 1503.70ms | no change | | QQuery 9 | 1771.67ms | 1771.04ms | no change | | QQuery 10 | 1464.79ms | 1467.14ms | no change | | QQuery 11 | 348.81ms | 349.58ms | no change | | QQuery 12 | 1623.71ms | 1613.43ms | no change | | QQuery 13 | 640.64ms | 643.67ms | no change | | QQuery 14 | 1225.25ms | 1228.61ms | no change | | QQuery 15 | 2350.93ms | 2362.08ms | no change | | QQuery 16 | 224.96ms | 230.69ms | no change | | QQuery 17 | 3580.82ms | 3584.29ms | no change | | QQuery 18 | 3396.31ms | 3425.83ms | no change | | QQuery 19 | 1401.42ms | 1405.69ms | no change | | QQuery 20 | 1543.74ms | 1576.21ms | no change | | QQuery 21 | 4267.18ms | 4258.72ms | no change | | QQuery 22 | 341.99ms | 343.75ms | no change | However, I cannot say definitely this has fixed the problem. I think we have 3 options 1. Extend `GroupedHashAggregateStream` to support streaming aggregation (current approach). 2. Create a new kind of `Stream` for streaming aggregation (use it from `AggregateExec`). 3. Create a new kind of `Executor` such `StreamingAggregateExec` for streaming use cases. My judgement is as follows for each case Pros of first approach: - Introduces least amount of change, utilizes a lot of common code. Cons of first approach - Possibly downgrade performance - Extends state with members that are not used in all cases Pros of 2nd approach - No performance penalty - Clear state - We can still produce non-pipeline breaking results given that there is an existing ordering for group by expressions Pros of 3rd approach - No performance penalty - Clear state Cons of 3rd approach - We can only produce non-pipeline breaking results when source is marked as unbounded. (For bounded cases even if there is an existing ordering that can produce non-pipeline breaking result; without an additional rule to change executor, we cannot produce non-pipeline breaking results.) We can pursue any one of the above approaches. If community has a preference we can pursue that approach. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
