alamb opened a new pull request, #6617:
URL: https://github.com/apache/arrow-datafusion/pull/6617

   # Which issue does this PR close?
   https://github.com/apache/arrow-datafusion/issues/5781
   
   
   # Rationale for this change
   We would like to allow users to take full advantage of the power of 
DataFusion's window functions (largely contributed by @ozankabak and 
@mustafasrepo  👏 ) 
   
   This PR contains a potential implementation of User Defined Window 
Functions: (the "Use existing APIs" approach described on 
https://github.com/apache/arrow-datafusion/issues/5781#issuecomment-1583105449)
   
   I don't intend to merge this specific PR. Instead, if the community likes 
this basic approach I will break this PR up into pieces and incrementally merge 
it 
   
   # What changes are included in this PR?
   
   The new example in this PR shows how this works. Run 
   
   ```shell
   cargo run --example simple_udwf
   ```
   
   Which produces the following output (where `my_average`'s implementation is 
defined in `simple_udwf.rs` as a user defined window function):
   
   ```
   
+-------+-------+--------------------------+------------------------+---------------------+
   | car   | speed | LAG(cars.speed,Int64(1)) | my_average(cars.speed) | time   
             |
   
+-------+-------+--------------------------+------------------------+---------------------+
   | red   | 20.0  |                          | 20.0                   | 
1996-04-12T12:05:03 |
   | red   | 20.3  | 20.0                     | 20.15                  | 
1996-04-12T12:05:04 |
   | red   | 21.4  | 20.3                     | 20.85                  | 
1996-04-12T12:05:05 |
   | red   | 21.5  | 21.4                     | 21.45                  | 
1996-04-12T12:05:06 |
   | red   | 19.0  | 21.5                     | 20.25                  | 
1996-04-12T12:05:07 |
   | red   | 18.0  | 19.0                     | 18.5                   | 
1996-04-12T12:05:08 |
   | red   | 17.0  | 18.0                     | 17.5                   | 
1996-04-12T12:05:09 |
   | red   | 7.0   | 17.0                     | 12.0                   | 
1996-04-12T12:05:10 |
   | red   | 7.1   | 7.0                      | 7.05                   | 
1996-04-12T12:05:11 |
   | red   | 7.2   | 7.1                      | 7.15                   | 
1996-04-12T12:05:12 |
   | red   | 3.0   | 7.2                      | 5.1                    | 
1996-04-12T12:05:13 |
   | red   | 1.0   | 3.0                      | 2.0                    | 
1996-04-12T12:05:14 |
   | red   | 0.0   | 1.0                      | 0.5                    | 
1996-04-12T12:05:15 |
   | green | 10.0  |                          | 10.0                   | 
1996-04-12T12:05:03 |
   | green | 10.3  | 10.0                     | 10.15                  | 
1996-04-12T12:05:04 |
   | green | 10.4  | 10.3                     | 10.350000000000001     | 
1996-04-12T12:05:05 |
   | green | 10.5  | 10.4                     | 10.45                  | 
1996-04-12T12:05:06 |
   | green | 11.0  | 10.5                     | 10.75                  | 
1996-04-12T12:05:07 |
   | green | 12.0  | 11.0                     | 11.5                   | 
1996-04-12T12:05:08 |
   | green | 14.0  | 12.0                     | 13.0                   | 
1996-04-12T12:05:09 |
   | green | 15.0  | 14.0                     | 14.5                   | 
1996-04-12T12:05:10 |
   | green | 15.1  | 15.0                     | 15.05                  | 
1996-04-12T12:05:11 |
   | green | 15.2  | 15.1                     | 15.149999999999999     | 
1996-04-12T12:05:12 |
   | green | 8.0   | 15.2                     | 11.6                   | 
1996-04-12T12:05:13 |
   | green | 2.0   | 8.0                      | 5.0                    | 
1996-04-12T12:05:14 |
   
+-------+-------+--------------------------+------------------------+---------------------+
   ```
   
   Here are the major changes in this PR
   
   1. Move `PartitionEvaluator` definition into datafusion_expr (much like the 
`Accumulator` trait for AggregateUDFs)
   2. Moved `WindowAggState`, ` WindwFrameContext ` and some related structures 
to `datafusion_expr` (so the UDWF did not depend on `datafusion-physical-expr`
   3. `Traiti`fy the built in state so `WindowUDF` did not depend on 
`datafusion-physical-expr`
   
   
   # Open questions:
   I think it may be possible to simplify the `PartitionEvaluator`  to remove 
the state management which would make the needed changes (the amount of code 
that needs to be moved to `datafusion_expr`) smaller. I will try to do this as 
a separate PR
   
   # Outstaning cleanups
   
   I found a place where the optimizer special cases a particular window 
function which I think I can remove (and I will try to do so as separate PR
   
   
https://github.com/apache/arrow-datafusion/blob/1af846bd8de387ce7a6e61a2008917a7610b9a7b/datafusion/core/src/physical_plan/windows/mod.rs#L254-L257
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to