prashanthbdremio opened a new issue, #1194:
URL: https://github.com/apache/arrow-java/issues/1194

   ### Describe the bug, including details regarding any error messages, 
version, and platform.
   
   `ListVector` and `LargeListVector` can expose an invalid offset buffer state 
when `valueCount == 0`.
   
   For an empty list vector, the logical offset buffer should still contain the 
leading offset entry:
   
   - `ListVector`: `(valueCount + 1) * 4 == 4` bytes
   - `LargeListVector`: `(valueCount + 1) * 8 == 8` bytes
   
   However, in the empty-vector path, the offset buffer can have:
   
   ```text
   readerIndex: 0
   writerIndex: 4
   capacity: 0
   ```
   
   or the equivalent `writerIndex: 8, capacity: 0` for `LargeListVector`.
   
   This violates the normal buffer invariant:
   
   ```text
   0 <= readerIndex <= writerIndex <= capacity
   ```
   
   Downstream consumers that unwrap or serialize the Arrow buffer through Netty 
can then fail with:
   
   ```text
   IndexOutOfBoundsException: readerIndex: 0, writerIndex: 4
   (expected: 0 <= readerIndex <= writerIndex <= capacity(0))
   ```
   
   The issue is that `setReaderAndWriterIndex()` sets the offset buffer writer 
index based on `valueCount * OFFSET_WIDTH`, which is `0` for empty vectors. But 
list vectors still require one offset slot even when there are no values.
   
   The same issue applies to both:
   
   - `org.apache.arrow.vector.complex.ListVector`
   - `org.apache.arrow.vector.complex.LargeListVector`
   
   ### Expected behavior
   
   For `valueCount == 0`, the offset buffer should still have enough capacity 
and readable bytes for the leading zero offset:
   
   ```text
   (valueCount + 1) * OFFSET_WIDTH
   ```
   
   So:
   
   - empty `ListVector` should expose at least 4 bytes for offset `[0]`
   - empty `LargeListVector` should expose at least 8 bytes for offset `[0]`
   
   The first offset value should be zero.
   
   ### Actual behavior
   
   An empty list vector can expose an offset buffer with a non-zero writer 
index but zero capacity, causing Netty buffer validation to fail when the 
buffer is unwrapped or consumed.
   
   ### Suggested fix
   
   Update `ListVector.setReaderAndWriterIndex()` and 
`LargeListVector.setReaderAndWriterIndex()` so the offset buffer writer index 
is based on:
   
   ```java
   (valueCount + 1) * OFFSET_WIDTH
   ```
   
   For the `valueCount == 0` case, ensure the offset buffer has enough capacity 
for the leading zero offset before setting the writer index.
   
   Care should be taken not to shrink the vector's future offset allocation 
size when allocating this empty sentinel offset buffer.
   
   ### Additional context
   
   This was observed downstream in Dremio after upgrading Arrow Java. The 
failure occurred while sending a record batch containing an empty list vector, 
where the send path unwraps Arrow buffers through Netty.
   
   The downstream error was:
   
   ```text
   SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: 4
   (expected: 0 <= readerIndex <= writerIndex <= capacity(0))
   ```
   
   This issue is distinct from #1125. That issue involves 
`UnionListReader.setPosition` on a post-IPC empty list. This issue is about the 
offset buffer exported by empty `ListVector` / `LargeListVector` instances 
having an invalid writer-index/capacity relationship.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to