jhrotko commented on PR #949:
URL: https://github.com/apache/arrow-java/pull/949#issuecomment-3723478988

   Thanks for the feedback. After looking at this more carefully, let me be 
clear of my current understanding of these two options:
   
   The main concern is that when a holder stores an `ArrowBuf` reference, it 
keeps the entire buffer alive even after the vector is closed.  This is 
especially problematic if you're saving holders for later use, you end up 
keeping large buffers in memory when you only need a few values.
   
   Did you mean something like this?
   ```mermaid
   sequenceDiagram
       participant App as Application
       participant Vector as UuidVector
       participant Buffer as ArrowBuf (1000 UUIDs)
       participant Holder as UuidHolder
       participant Allocator as BufferAllocator
       
       Note over App,Allocator: Step 1: Create vector with 1000 UUIDs
       App->>Vector: allocateNew(1000)
       Vector->>Allocator: allocate 16,000 bytes
       Allocator->>Buffer: Create buffer (refCount=1)
       Buffer-->>Vector: buffer reference
       
       Note over App,Allocator: Step 2: Read ONE UUID into holder
       App->>Vector: get(0, holder)
       Vector->>Holder: holder.buffer = vector.buffer
       Vector->>Holder: holder.start = 0
       Note over Buffer: refCount=2 (vector + holder)
       
       Note over App,Allocator: Step 3: Close vector (done with it)
       App->>Vector: close()
       Vector->>Buffer: release() - refCount=2→1
       Note over Buffer: ❌ Buffer NOT freed!<br/>Holder still references it
       
       Note over App,Allocator: Step 4: Application keeps holder
       Note over Holder: Holder only needs 16 bytes<br/>but keeps 16,000 bytes 
alive!
       Note over Buffer: ❌ 15,984 bytes wasted!
       
       Note over App,Allocator: Step 5: Eventually holder goes out of scope
       App->>Holder: (garbage collected)
       Holder->>Buffer: release() - refCount=1→0
       Buffer->>Allocator: free memory
       Note over Buffer: ✅ Finally freed
   ```
   
   I initially modeled UUID after VarChar and Decimal, but if this is the 
lifetime risk for VarcharHolder we accept this risk since copying can be very 
expensive while for a UuidHolder copying is trivial, so there's no reason to 
take on the buffer lifetime risk.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to