jhrotko commented on PR #949:
URL: https://github.com/apache/arrow-java/pull/949#issuecomment-3723478988
Thanks for the feedback. After looking at this more carefully, let me be
clear of my current understanding of these two options:
The main concern is that when a holder stores an `ArrowBuf` reference, it
keeps the entire buffer alive even after the vector is closed. This is
especially problematic if you're saving holders for later use, you end up
keeping large buffers in memory when you only need a few values.
Did you mean something like this?
```mermaid
sequenceDiagram
participant App as Application
participant Vector as UuidVector
participant Buffer as ArrowBuf (1000 UUIDs)
participant Holder as UuidHolder
participant Allocator as BufferAllocator
Note over App,Allocator: Step 1: Create vector with 1000 UUIDs
App->>Vector: allocateNew(1000)
Vector->>Allocator: allocate 16,000 bytes
Allocator->>Buffer: Create buffer (refCount=1)
Buffer-->>Vector: buffer reference
Note over App,Allocator: Step 2: Read ONE UUID into holder
App->>Vector: get(0, holder)
Vector->>Holder: holder.buffer = vector.buffer
Vector->>Holder: holder.start = 0
Note over Buffer: refCount=2 (vector + holder)
Note over App,Allocator: Step 3: Close vector (done with it)
App->>Vector: close()
Vector->>Buffer: release() - refCount=2→1
Note over Buffer: ❌ Buffer NOT freed!<br/>Holder still references it
Note over App,Allocator: Step 4: Application keeps holder
Note over Holder: Holder only needs 16 bytes<br/>but keeps 16,000 bytes
alive!
Note over Buffer: ❌ 15,984 bytes wasted!
Note over App,Allocator: Step 5: Eventually holder goes out of scope
App->>Holder: (garbage collected)
Holder->>Buffer: release() - refCount=1→0
Buffer->>Allocator: free memory
Note over Buffer: ✅ Finally freed
```
I initially modeled UUID after VarChar and Decimal, but if this is the
lifetime risk for VarcharHolder we accept this risk since copying can be very
expensive while for a UuidHolder copying is trivial, so there's no reason to
take on the buffer lifetime risk.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]