alamb commented on PR #6617:
URL:
https://github.com/apache/arrow-datafusion/pull/6617#issuecomment-1587657307
> Hence I think, we should either support for user to create custom
BuiltInWindowFunctionExpr for each new WindowUDF (I think this is hard and more
cumbersome on the user side) or we should move all the parameters that effects
the decision (such as supports_bounded_execution, uses_window_frame,
include_rank) to the evaluator and/or aggregator side. By this way user will
have full control on the custom implementation.
thank you for the response @mustafasrepo -- this makes sense to me. I agree
it would be better if the type system could help enforce functions that needed
to be implemented
Are you thinking something like this?
```
pub trait Accumulator: Send + Sync + Debug {
// Required methods
fn state(&self) -> Result<Vec<ScalarValue, Global>, DataFusionError>;
fn update_batch(
&mut self,
values: &[Arc<dyn Array + 'static>]
) -> Result<(), DataFusionError>;
fn merge_batch(
&mut self,
states: &[Arc<dyn Array + 'static>]
) -> Result<(), DataFusionError>;
fn evaluate(&self) -> Result<ScalarValue, DataFusionError>;
fn size(&self) -> usize;
}
/// A trait for accumualtors that also provide the ability to remove rows
from a window incrementally
pub trait RetractableAccumulator: Accumulator {. // <---- add this trait
// Provided method
fn retract_batch(
&mut self,
_values: &[Arc<dyn Array + 'static>]
) -> Result<(), DataFusionError> { ... }
}
```
And then adding a separate function to AggregateUDF:
```rust
enum AccumulatorImpl {. // <---- add this enum
NonRetractable(Box<dyn Accumulator + 'static>
/// returned if the accumulator supports retractable windows
Retractable(Box<dyn RetractableAccumulator + 'static>),
}
pub struct AggregateUDF {
pub name: ...,
pub signature: ...,
pub return_type: ...
// change this signature ----+v
pub accumulator: Arc<dyn Fn(&DataType) -> Result<AccumulatorImpl>,
DataFusionError> + Sync + Send + 'static>,
pub state_type: ...
}
```
?
Also related is https://github.com/apache/arrow-datafusion/issues/6611 we
found (which is one of the features of the `Accumulator` that is not yet
exposed by AggregateUDF)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]