[GitHub] [arrow] lidavidm commented on pull request #13248: ARROW-16673: [Java] C data interface: Allow ownership transferring for imported buffer

GitBox Tue, 09 Aug 2022 12:42:42 -0700


lidavidm commented on PR #13248:
URL: https://github.com/apache/arrow/pull/13248#issuecomment-1209801083

The semantics of this method in the interface are rather confusing, but I
think this operation does not make sense to implement with the current design.
The way it works is that allocators form a hierarchy, with a RootAllocator at
the…root, and child allocators deriving from that. An ArrowBuf is a slice of an
underlying buffer whose reference count is managed by a BufferLedger (so when
you manipulate an ArrowBuf's refcnt, it's actually updating the BufferLedger).
Now, a buffer/BufferLedger can be referenced by multiple allocators, but is
only "owned" by a single allocator. This method is just meant to swap which
allocator owns the buffer.

However, as implemented, it only works for allocators with the same root:

https://github.com/apache/arrow/blob/cc9b89a04143446482d66d4c35bfdf2312d90a05/java/memory/memory-core/src/main/java/org/apache/arrow/memory/AllocationManager.java#L96-L97

So I take this to mean that the intent of this method is to shuffle the
accounting of who is ultimately responsible for the reference count. Since C
Data Interface buffers aren't owned by any allocator, they can't be transferred
(or it doesn't make sense for them to be transferred).

As an example: I think the intent of this method is that you could open a
child allocator, then open a file with the child allocator, and read a batch.
The batch's buffers would belong to the child allocator. Now you want to close
the file, but keep the memory around, so then you would transfer ownership to
the root allocator. That way, the child allocator can be closed without it
complaining about unclosed buffers. (Or maybe instead of files, think about
passing batches between nodes in a computation graph. Each graph could have its
own child allocator so that at the end of the query, you can pinpoint any
memory leaks by individual nodes; then you need to transfer ownership so that
you don't get spurious errors.)

I think what needs to happen is that C Data Interface allocations need to
get associated with an actual allocator, even if the allocator isn't actually
doing the allocation. The allocator should include the memory of these native
buffers in its own accounting (I think this is a desirable property, so that
you can properly track and limit memory consumption as before). Then it would
make sense to have transferOwnership.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] lidavidm commented on pull request #13248: ARROW-16673: [Java] C data interface: Allow ownership transferring for imported buffer

Reply via email to