alamb commented on PR #6617:
URL: 
https://github.com/apache/arrow-datafusion/pull/6617#issuecomment-1587657307

   > Hence I think, we should either support for user to create custom 
BuiltInWindowFunctionExpr for each new WindowUDF (I think this is hard and more 
cumbersome on the user side) or we should move all the parameters that effects 
the decision (such as supports_bounded_execution, uses_window_frame, 
include_rank) to the evaluator and/or aggregator side. By this way user will 
have full control on the custom implementation.
   
   thank you for the response @mustafasrepo  -- this makes sense to me. I agree 
it would be better if the type system could help enforce functions that needed 
to be implemented
   
   Are you thinking something like this?
   
   ```
   pub trait Accumulator: Send + Sync + Debug {
       // Required methods
       fn state(&self) -> Result<Vec<ScalarValue, Global>, DataFusionError>;
       fn update_batch(
           &mut self,
           values: &[Arc<dyn Array + 'static>]
       ) -> Result<(), DataFusionError>;
       fn merge_batch(
           &mut self,
           states: &[Arc<dyn Array + 'static>]
       ) -> Result<(), DataFusionError>;
       fn evaluate(&self) -> Result<ScalarValue, DataFusionError>;
       fn size(&self) -> usize;
   }
   
   /// A trait for accumualtors that also provide the ability to remove rows 
from a window incrementally
   pub trait RetractableAccumulator: Accumulator {.  // <---- add this trait
       // Provided method
       fn retract_batch(
           &mut self,
           _values: &[Arc<dyn Array + 'static>]
       ) -> Result<(), DataFusionError> { ... }
   }
   
   ```
   
   And then adding a separate function to AggregateUDF:
   
   
   
   ```rust
   enum AccumulatorImpl {.     // <---- add this enum
     NonRetractable(Box<dyn Accumulator + 'static>
     /// returned if the accumulator supports retractable windows
     Retractable(Box<dyn RetractableAccumulator + 'static>),
   }
   
   pub struct AggregateUDF {
       pub name: ...,
       pub signature: ...,
       pub return_type: ...
       // change this signature ----+v
       pub accumulator: Arc<dyn Fn(&DataType) -> Result<AccumulatorImpl>, 
DataFusionError> + Sync + Send + 'static>,
       pub state_type: ...
   }
   ```
   ?
   
   Also related is https://github.com/apache/arrow-datafusion/issues/6611 we 
found (which is one of the features of the `Accumulator` that is not yet 
exposed by AggregateUDF)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to