akurmustafa commented on issue #23197: URL: https://github.com/apache/datafusion/issues/23197#issuecomment-4836070068
Yes, as @alamb and @2010YOUY01 said the original idea was to have a support for a window operator where given the input is ordered by either or both by PARTITION BY and ORDER BY clauses; Window function didn't buffer all of the batches at its input to save memory and support streaming. For a window function with following clause `PARTITION BY <expr1> ORDER BY <expr2>` operator supports input with following orderings: - case1: `<expr1>` - case2: `<exp2>` - case3: `<exp1> + <exp2>` I think, most of the complexity in the implementation comes from having the support for different use cases. However, I think case 1 and case 2 are mostly for streaming cases. I don't see a benefit for keeping them for non-streaming cases. If we assume input ordering will always be satisfied as in the case 3 (both PARTITION BY AND ORDER BY expression clauses), I think implementation can be simplified and at the end DataFusion can have single Window operator which doesn't expect all input to be buffered. As far as I remember, `WindowAggExec` works assumes always case 3 is valid and buffers all of the data at its input. I agree to @alamb and @2010YOUY01 that focusing on `WindowAggExec` for improvement is better course and in the future maybe in the planning phase, we can only generate plans that contain `WindowAggExec` then discontinue the `BoundedWindowAggExec`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
