alamb opened a new issue, #13516:
URL: https://github.com/apache/datafusion/issues/13516

   ### Is your feature request related to a problem or challenge?
   
   Arrow Arrays are designed to be immutable and use shared references 
extensively, but it is possible to reuse the underlying buffer in some cases 
when there are no other references (see the arrow 
[unary_mut](https://docs.rs/arrow/latest/arrow/compute/fn.unary_mut.html) 
kernel for example)
   
   At the time of writing, DataFusion scalar functions 
([`ScalarFunctionImpl`](https://docs.rs/datafusion/latest/datafusion/logical_expr/trait.ScalarUDFImpl.html)
 must *always* allocate a new array when generating output. They can not reuse 
the existing underlying memory, even if the source array will never be used 
again
   
   This is because the invoke signature gets the arguments as reference (slice 
of `ColumnarValue`) rather than by ownership
   
   ```rust
   fn invoke_batch(
       &self,
       args: &[ColumnarValue],
       number_rows: usize,
   ) -> Result<ColumnarValue, DataFusionError>
   ```
   
   For example, an expression like `(a + b) + c`  will be evaluated like
   - `a + b` --> `temp_array`
   - `temp_array + c` --> `result_array`
   
   Resulting in two new allocations
   
   ### Describe the solution you'd like
   
   
   
   It would be really nice if it were possible to evaluate `(a + b) + c`  like 
this (with no new allocations)
   - `a + b` --> `a` (write output to `a`, reusing allocation)
   - `a + c` --> `a` (now add c, also reusing allocation)
   
   And the result would be a new array that re-used the original allocation of 
the `a` array
   
   
   
   
   
   ### Describe alternatives you've considered
   
   Now that this is merged
   - https://github.com/apache/datafusion/pull/13290 (thanks @joseph-isaacs)
   
   I think we can make  it possible in the future to reuse allocations by 
changing what is passed into `ScalarFunctionArgs` 
   
   Since we haven't yet released a version with `ScalarFunctionArgs` we can 
change its signature without breaking APIs until DataFusion 44 is released
   
   ### Additional context
   
   I have a draft of the basic idea here: 
   - https://github.com/apache/datafusion/pull/13507


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to