alamb commented on code in PR #8925:
URL: https://github.com/apache/arrow-datafusion/pull/8925#discussion_r1460830841


##########
datafusion/physical-expr/src/aggregate/median.rs:
##########
@@ -171,9 +171,8 @@ impl<T: ArrowNumericType> Accumulator for 
MedianAccumulator<T> {
         Ok(())
     }
 
-    fn evaluate(&self) -> Result<ScalarValue> {
-        // TODO: evaluate could pass &mut self
-        let mut d = self.all_values.clone();
+    fn evaluate(&mut self) -> Result<ScalarValue> {
+        let mut d = std::mem::take(&mut self.all_values);

Review Comment:
   🎉 here is an example showing why this API change is good -- you can avoid 
this copy
   
   This change this might actually improve the performance of median for large 
inputs measurably (though I have not measured it)



##########
datafusion-examples/examples/advanced_udaf.rs:
##########
@@ -104,7 +104,7 @@ impl Accumulator for GeometricMean {
     // This function serializes our state to `ScalarValue`, which DataFusion 
uses
     // to pass this state between execution stages.
     // Note that this can be arbitrary data.
-    fn state(&self) -> Result<Vec<ScalarValue>> {
+    fn state(&mut self) -> Result<Vec<ScalarValue>> {

Review Comment:
   This is an example of what DataFusion users have to do if this PR is merged 
-- they have to change the signatures to take `mut`



##########
datafusion/expr/src/accumulator.rs:
##########
@@ -56,11 +56,17 @@ pub trait Accumulator: Send + Sync + Debug {
     /// running sum.
     fn update_batch(&mut self, values: &[ArrayRef]) -> Result<()>;
 
-    /// Returns the final aggregate value.
+    /// Returns the final aggregate value and resets internal state.
     ///
     /// For example, the `SUM` accumulator maintains a running sum,
     /// and `evaluate` will produce that running sum as its output.
-    fn evaluate(&self) -> Result<ScalarValue>;
+    ///
+    /// After this call, the accumulator's internal state should be
+    /// equivalent to when it was first created.
+    ///
+    /// This function gets a `mut` accumulator to allow for the accumulator to
+    /// use an arrow compatible internal state when possible.
+    fn evaluate(&mut self) -> Result<ScalarValue>;

Review Comment:
   This is the core change in this PR -- make `evaluate` and `state` get a 
`&mut` reference so the Accumulator can avoid copies
   
   This is technically possible today (with "inner mutability", for example a 
`Mutex`) but that is both not documented and quite unobvious



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to