alamb commented on issue #8938:
URL: https://github.com/apache/arrow-rs/issues/8938#issuecomment-3739770880

   > The main advantage is that callers creating Arrays in multiple locations 
of their project won't need to remember to claim data manually at each 
location. This also helps track intermediate allocations. For instance, if a 
kernel merges two arrays, it will probably need to allocate intermediate 
buffers (like null buffers or offset buffers). Without automatic claiming, the 
pool will only report that memory is exhausted after the kernel returns and 
intermediate allocations are complete, rather than detecting exhaustion while 
attempting the intermediate allocation itself.
   
   I agree wit this
   
   Another potential API we could try is implement `wrapper` APIs in datafusion 
-- for example, we could implement the
   ```rust
   fn cast(context: & MemoryContet, array: &dyn Array, target_type: &DataType) 
-> Result<ArrayRef> {
     // call arrow cast, then claim memory with pool
     // return arr
   }
   ```
   
   Then we could audit the DataFusion code and ensure that there are no 
explicit calls to the arrow kernels directly in Datafusion code but we routed 
them through the wrappers
   
   Depending on how that works out, we can then contemplate porting such APIs 
upstream into Arrow.
   
   There is already some version of these wrappers here: 
   - 
https://github.com/apache/datafusion/blob/7716cae50403c68c176afbb3987bd38abbbaeac0/datafusion/physical-expr/src/expressions/binary/kernels.rs#L18-L19


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to