telemenar commented on issue #1194:
URL: https://github.com/apache/arrow-java/issues/1194#issuecomment-4858336810

   There is some nuance here because there are two things in tension:
   
   * Arrow layout/spec compliance: a list array with `valueCount == 0` still 
has one offset entry, so the exported offset buffer should contain `offset[0] 
== 0`.
   * Current `ListVector` / `LargeListVector` lifecycle behavior: freshly 
constructed, cleared, or otherwise empty vectors may legitimately have no 
allocated buffers yet.
   
   Either way, the current `setReaderAndWriterIndex()` behavior for a 
`valueCount == 0` `ListVector` is guaranteed to produce an invalid buffer state 
if the vector is still in the same state returned by `ListVector.empty()`.
   
   There are also a few ways to get into that state somewhat unexpectedly. For 
example, `ListVector.TransferImpl.splitAndTransfer()` calls 
`ListVector.clear()` on the destination vector. When the split length is `0`, 
that can release the destination buffers and leave the destination with 
`allocator.getEmpty()` as its offset buffer.
   
   Given the existing clear/reset behavior and the surrounding class hierarchy, 
I suspect persistent early enforcement would be tricky to implement cleanly. It 
would mean preserving or recreating the one-entry offset buffer across 
construction, `clear()`, and other empty-vector transitions, which may conflict 
with existing assumptions that an empty vector can hold no buffers.
   
   So boundary enforcement may be the more practical direction: allow the 
internal empty/unallocated state, but materialize the required zero offset at 
API boundaries where Arrow physical layout matters.
   
   The design questions I’m not fully sure about are:
   
   * Other than the paths that call `setReaderAndWriterIndex()`, are there 
other boundaries that need the same protection?
   * Is it acceptable for `getFieldBuffers()` to allocate the required 
one-entry offset buffer as part of preparing the vector for 
export/serialization?
   * Should call sites like zero-length `splitAndTransfer()` also enforce this 
before returning a schema-visible destination vector, or should that be left 
entirely to export-time handling?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to