lidavidm commented on PR #13248:
URL: https://github.com/apache/arrow/pull/13248#issuecomment-1215899098

   > > C Data Interface allocations need to get associated with an actual 
allocator, even if the allocator isn't actually doing the allocation.
   > 
   > I guess this is reasonable given where we are, but associating with an 
allocator that isn't doing the allocation seems likely to confuse future 
developers if the memory appears to be managed in the Java code but isn't. Is 
the idea that all users of a given bit of memory keep track of their own use, 
but only the originator (C++ presumably in this case) actually frees it?
   
   @zhztheplayer correct me if I'm wrong, but what I see as the issue is this: 
data comes in via the C Data Interface. You then want to unload the data into 
an ArrowRecordBatch and load it into another root (that happens to have its own 
allocator) to perform some computation on it. But since transfer is not 
implemented, that currently fails.
   
   The problem is that this transfer, _semantically_, is to change the 
ownership of the buffer. So yes, an allocator would end up with memory it 
didn't technically allocate. But that's already the case  - the whole point of 
these APIs is to make an allocator responsible for something it didn't 
allocate! The main implementation problem I see is that currently, the transfer 
assumes the memory was allocated by an allocator in the same _allocator tree_, 
i.e. that the allocator implementation is the same. That's the main assumption 
that would have to change.
   
   Perhaps 'allocator' is the wrong way to think about the API if/when we make 
the change; BufferAllocator is primarily an arena or nursery used for 
accounting of buffers, that can also allocate new buffers that are associated 
with itself. In this model, it makes perfect sense to transfer buffers between 
allocators: it's just a conscious choice to hand over memory ownership. (My 
memory of the details is fuzzy, but the underlying refcount doesn't really 
change; it's just changes whether the allocator will complain about a buffer 
when closed or not.)
   
   The Java library works by refcounting a buffer, and freeing it when the 
reference count reaches 0. From this perspective, memory from the C Data 
Interface is no different, it's just another kind of buffer to refcount. When 
the Java-side refcount reaches 0, the C-side release callback is invoked (this 
is already how it works).
   
   > > Ok, thanks. In that case, it seems like we need the ability to associate 
the C Data reference manager with a Java allocator, and update APIs to require 
a BufferAllocator to associate with. I don't think that'll require breaking 
changes in the core APIs, but it would probably mean updating/deprecating APIs 
in the C Data interface.
   > 
   > Since the API already has a bufferAllocator argument (that is ignored), 
I'm not sure what change you're suggesting unless it's simply to annotate the 
argument as non-nullable. If there was no bufferAllocator argument it would be 
simple enough to deprecate the current method and add a new one.
   
   Yes, this argument would become required.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to