lidavidm commented on PR #13248: URL: https://github.com/apache/arrow/pull/13248#issuecomment-1209801083
The semantics of this method in the interface are rather confusing, but I think this operation does not make sense to implement with the current design. The way it works is that allocators form a hierarchy, with a RootAllocator at the…root, and child allocators deriving from that. An ArrowBuf is a slice of an underlying buffer whose reference count is managed by a BufferLedger (so when you manipulate an ArrowBuf's refcnt, it's actually updating the BufferLedger). Now, a buffer/BufferLedger can be referenced by multiple allocators, but is only "owned" by a single allocator. This method is just meant to swap which allocator owns the buffer. However, as implemented, it only works for allocators with the same root: https://github.com/apache/arrow/blob/cc9b89a04143446482d66d4c35bfdf2312d90a05/java/memory/memory-core/src/main/java/org/apache/arrow/memory/AllocationManager.java#L96-L97 So I take this to mean that the intent of this method is to shuffle the accounting of who is ultimately responsible for the reference count. Since C Data Interface buffers aren't owned by any allocator, they can't be transferred (or it doesn't make sense for them to be transferred). As an example: I think the intent of this method is that you could open a child allocator, then open a file with the child allocator, and read a batch. The batch's buffers would belong to the child allocator. Now you want to close the file, but keep the memory around, so then you would transfer ownership to the root allocator. That way, the child allocator can be closed without it complaining about unclosed buffers. (Or maybe instead of files, think about passing batches between nodes in a computation graph. Each graph could have its own child allocator so that at the end of the query, you can pinpoint any memory leaks by individual nodes; then you need to transfer ownership so that you don't get spurious errors.) I think what needs to happen is that C Data Interface allocations need to get associated with an actual allocator, even if the allocator isn't actually doing the allocation. The allocator should include the memory of these native buffers in its own accounting (I think this is a desirable property, so that you can properly track and limit memory consumption as before). Then it would make sense to have transferOwnership. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
