lidavidm commented on PR #13248:
URL: https://github.com/apache/arrow/pull/13248#issuecomment-1209801083

   The semantics of this method in the interface are rather confusing, but I 
think this operation does not make sense to implement with the current design. 
The way it works is that allocators form a hierarchy, with a RootAllocator at 
the…root, and child allocators deriving from that. An ArrowBuf is a slice of an 
underlying buffer whose reference count is managed by a BufferLedger (so when 
you manipulate an ArrowBuf's refcnt, it's actually updating the BufferLedger). 
Now, a buffer/BufferLedger can be referenced by multiple allocators, but is 
only "owned" by a single allocator. This method is just meant to swap which 
allocator owns the buffer.
   
   However, as implemented, it only works for allocators with the same root:
   
https://github.com/apache/arrow/blob/cc9b89a04143446482d66d4c35bfdf2312d90a05/java/memory/memory-core/src/main/java/org/apache/arrow/memory/AllocationManager.java#L96-L97
   
   So I take this to mean that the intent of this method is to shuffle the 
accounting of who is ultimately responsible for the reference count. Since C 
Data Interface buffers aren't owned by any allocator, they can't be transferred 
(or it doesn't make sense for them to be transferred).
   
   As an example: I think the intent of this method is that you could open a 
child allocator, then open a file with the child allocator, and read a batch. 
The batch's buffers would belong to the child allocator. Now you want to close 
the file, but keep the memory around, so then you would transfer ownership to 
the root allocator. That way, the child allocator can be closed without it 
complaining about unclosed buffers. (Or maybe instead of files, think about 
passing batches between nodes in a computation graph. Each graph could have its 
own child allocator so that at the end of the query, you can pinpoint any 
memory leaks by individual nodes; then you need to transfer ownership so that 
you don't get spurious errors.)
   
   I think what needs to happen is that C Data Interface allocations need to 
get associated with an actual allocator, even if the allocator isn't actually 
doing the allocation. The allocator should include the memory of these native 
buffers in its own accounting (I think this is a desirable property, so that 
you can properly track and limit memory consumption as before). Then it would 
make sense to have transferOwnership.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to