alamb opened a new issue, #8934:
URL: https://github.com/apache/arrow-datafusion/issues/8934

   ### Is your feature request related to a problem or challenge?
   
   As part of https://github.com/apache/arrow-datafusion/pull/8849, I wanted to 
build up the output state (the distinct string values) during accumulation and 
then generate output directly (without a copy)
   
   However, I couldn't implement a zero copy algorithm because `evaluate` and 
`merge` take a `&self` not a `&mut self` (well I worked around it with a 
`Mutex` 🤮 ). 
   
   The need to copy intermediate state likely doesn't matter for most 
`Accumulators` as they only emit a single scalar value where the cost of 
copying is pretty low. However,  for ones that emit significant internal state 
(like count `DISTINCT`) using the same internal state can save a lot of copying 
(again, see https://github.com/apache/arrow-datafusion/pull/8849 for an example)
   
   Also, the actual `Accumulator` instances are never used after a call to 
evaluate/state, so the state of the accumulator after this call is never used 
again.
   
   
   
   
   ### Describe the solution you'd like
   
   I would like to change `Accumulator::evaluate` and `Accumulator::state` to 
take `&mut self`
   
   
   This is also consistent with 
[`GroupsAccumulator`](https://docs.rs/datafusion/latest/datafusion/physical_expr/trait.GroupsAccumulator.html)
 which takes mut for its state and evaluate functions:
   
   ```rust
   pub trait GroupsAccumulator: Send {
   ...
       fn evaluate(
           &mut self,
           emit_to: EmitTo
       ) -> Result<Arc<dyn Array>, DataFusionError>;
       fn state(
           &mut self,
           emit_to: EmitTo
       ) -> Result<Vec<Arc<dyn Array>>, DataFusionError>;
   ...
   ```
   
   ### Describe alternatives you've considered
   
   _No response_
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to