alamb opened a new issue, #8934: URL: https://github.com/apache/arrow-datafusion/issues/8934
### Is your feature request related to a problem or challenge? As part of https://github.com/apache/arrow-datafusion/pull/8849, I wanted to build up the output state (the distinct string values) during accumulation and then generate output directly (without a copy) However, I couldn't implement a zero copy algorithm because `evaluate` and `merge` take a `&self` not a `&mut self` (well I worked around it with a `Mutex` 🤮 ). The need to copy intermediate state likely doesn't matter for most `Accumulators` as they only emit a single scalar value where the cost of copying is pretty low. However, for ones that emit significant internal state (like count `DISTINCT`) using the same internal state can save a lot of copying (again, see https://github.com/apache/arrow-datafusion/pull/8849 for an example) Also, the actual `Accumulator` instances are never used after a call to evaluate/state, so the state of the accumulator after this call is never used again. ### Describe the solution you'd like I would like to change `Accumulator::evaluate` and `Accumulator::state` to take `&mut self` This is also consistent with [`GroupsAccumulator`](https://docs.rs/datafusion/latest/datafusion/physical_expr/trait.GroupsAccumulator.html) which takes mut for its state and evaluate functions: ```rust pub trait GroupsAccumulator: Send { ... fn evaluate( &mut self, emit_to: EmitTo ) -> Result<Arc<dyn Array>, DataFusionError>; fn state( &mut self, emit_to: EmitTo ) -> Result<Vec<Arc<dyn Array>>, DataFusionError>; ... ``` ### Describe alternatives you've considered _No response_ ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
