lwhite1 commented on PR #13248:
URL: https://github.com/apache/arrow/pull/13248#issuecomment-1260033269

   Hoping to restart this thread. 
   
   I'm beginning to wonder if we (including and maybe especially, me) haven't 
made this all too complicated. 
   
   The proposed simple change makes the code work the same as other 
ReferenceManager implementations (or nearly so), and allows the original 
Vector's ArrowBuf to hand off its memory to another ArrowBuf. 
   
   The code here around the management of imported data is pretty complicated 
but I think this change is pretty safe.  From my review, I believe this is what 
is happening:
    
   In the ArrayImporter code, the CDataReferenceManager releases the memory as 
soon as the import is complete: 
   
   ```
       // This keeps the array alive as long as there are any buffers that need 
it
       referenceManager = new CDataReferenceManager(ownedArray);
       try {
         referenceManager.increment();
         doImport(snapshot);
       } finally {
         referenceManager.release();
       }
   ```
   
   So the ref count is already 0 when the transfer attempt occurs later.
   
   The Data class that provides the high-level interface for getting C-Data has 
methods like `importIntoVector()` and `importVectorSchemaRoot()` that take 
allocators as arguments (and objects to receive the data) that become 
associated with the imported data at that step in the process, although the 
ArrowBufs they hold still point to the CDataReferenceManager. That last detail 
is what currently causes subsequent attempts to transfer the data to fail. 
   
   In spite of the naming/documentation of the methods involved, there is 
nothing that requires a TransferPair to actually change what Allocator the 
newly created ArrowBuf is associated with. There are numerous use-cases for 
"transferring" the memory to the same Allocator it started with. 
   
   Maybe the most important existing, but currently unsupported, use cases are: 
   - constructing a new VectorSchemaRoot from one that is managed by native 
code. 
   - slicing a vector originating from native code
   
   Right now, you can use data from C in the format it was received in, or not 
at all.  It's not clear to me that the original change makes anything worse 
than it currently is, and it would provide support for these two important 
use-cases.  
   
   Of course, I could be wrong again. What am I missing?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to