mustafasrepo commented on PR #6869: URL: https://github.com/apache/arrow-datafusion/pull/6869#issuecomment-1624970043
Regarding > I still don't understand enough of what makes a PartitionEvaluator "stateful" to document what functions need to be implemented under what circumstances. Maybe @mustafasrepo could help clarify this If we ignore `uses_window_frame` flag. Implementation table is as follows |`supports_bounded_execution`|`include_rank`| function_to_implement | |---|---|---| |false|false| evaluate_all | |false|true| evaluate_all_with_rank | |true|false| evaluate | |true|true| evaluate | In this table, if `supports_bounded_execution` flag is `true`, `evaluate` method should be implemented. For some of the window functions such as `ROW_NUMBER`, evaluate can be implemented trivially(It can keep track of internal count, each time it is call it can increment counter for each row.). However, if we were to implement `evaluate` method for `RANK` function. Information in the `evaluate` argument (values received and ) is not enough to calculate result. We need to know separation boundaries of order by columns to be able to produce correct results. For this reason, we have `update_state` that enables us to encode useful information to the state. By implementing `update_state` of the `RANK` evaluator, we can store necessary information that enables us to produce correct result during `evaluate` method call. However, it is a bit confusing.I will try to simplify it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
