[GitHub] [arrow-datafusion] mustafasrepo commented on pull request #6869: Docs: try and clarify what `PartitionEvaluator` functions are called

via GitHub Fri, 07 Jul 2023 01:04:04 -0700


mustafasrepo commented on PR #6869:
URL: 
https://github.com/apache/arrow-datafusion/pull/6869#issuecomment-1624970043


   Regarding 
   > I still don't understand enough of what makes a PartitionEvaluator 
"stateful" to document what functions need to be implemented under what 
circumstances. Maybe @mustafasrepo could help clarify this
   
   If we ignore `uses_window_frame` flag. Implementation table is as follows
   |`supports_bounded_execution`|`include_rank`| function_to_implement |
   |---|---|---|
   |false|false| evaluate_all |
   |false|true| evaluate_all_with_rank |
   |true|false| evaluate |
   |true|true| evaluate |
   In this table, if `supports_bounded_execution` flag is `true`, `evaluate` 
method should be implemented. For some of the 
   window functions such as `ROW_NUMBER`, evaluate can be implemented 
trivially(It can keep track of internal count, each time it is call it can 
increment counter for each row.). However, if we were to implement `evaluate` 
method for `RANK` function. Information in the `evaluate` argument (values 
received and ) is not enough to calculate result. We need to know separation 
boundaries of order by columns to be able to produce correct results. For this 
reason, we have `update_state` that enables us to encode useful information to 
the state. By implementing `update_state` of the `RANK` evaluator, we can store 
necessary information that enables us to produce correct result during 
`evaluate` method call. 
   
   However, it is a bit confusing.I will try to simplify it. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] mustafasrepo commented on pull request #6869: Docs: try and clarify what `PartitionEvaluator` functions are called

Reply via email to