prashanthbdremio opened a new issue, #1194: URL: https://github.com/apache/arrow-java/issues/1194
### Describe the bug, including details regarding any error messages, version, and platform. `ListVector` and `LargeListVector` can expose an invalid offset buffer state when `valueCount == 0`. For an empty list vector, the logical offset buffer should still contain the leading offset entry: - `ListVector`: `(valueCount + 1) * 4 == 4` bytes - `LargeListVector`: `(valueCount + 1) * 8 == 8` bytes However, in the empty-vector path, the offset buffer can have: ```text readerIndex: 0 writerIndex: 4 capacity: 0 ``` or the equivalent `writerIndex: 8, capacity: 0` for `LargeListVector`. This violates the normal buffer invariant: ```text 0 <= readerIndex <= writerIndex <= capacity ``` Downstream consumers that unwrap or serialize the Arrow buffer through Netty can then fail with: ```text IndexOutOfBoundsException: readerIndex: 0, writerIndex: 4 (expected: 0 <= readerIndex <= writerIndex <= capacity(0)) ``` The issue is that `setReaderAndWriterIndex()` sets the offset buffer writer index based on `valueCount * OFFSET_WIDTH`, which is `0` for empty vectors. But list vectors still require one offset slot even when there are no values. The same issue applies to both: - `org.apache.arrow.vector.complex.ListVector` - `org.apache.arrow.vector.complex.LargeListVector` ### Expected behavior For `valueCount == 0`, the offset buffer should still have enough capacity and readable bytes for the leading zero offset: ```text (valueCount + 1) * OFFSET_WIDTH ``` So: - empty `ListVector` should expose at least 4 bytes for offset `[0]` - empty `LargeListVector` should expose at least 8 bytes for offset `[0]` The first offset value should be zero. ### Actual behavior An empty list vector can expose an offset buffer with a non-zero writer index but zero capacity, causing Netty buffer validation to fail when the buffer is unwrapped or consumed. ### Suggested fix Update `ListVector.setReaderAndWriterIndex()` and `LargeListVector.setReaderAndWriterIndex()` so the offset buffer writer index is based on: ```java (valueCount + 1) * OFFSET_WIDTH ``` For the `valueCount == 0` case, ensure the offset buffer has enough capacity for the leading zero offset before setting the writer index. Care should be taken not to shrink the vector's future offset allocation size when allocating this empty sentinel offset buffer. ### Additional context This was observed downstream in Dremio after upgrading Arrow Java. The failure occurred while sending a record batch containing an empty list vector, where the send path unwraps Arrow buffers through Netty. The downstream error was: ```text SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: 4 (expected: 0 <= readerIndex <= writerIndex <= capacity(0)) ``` This issue is distinct from #1125. That issue involves `UnionListReader.setPosition` on a post-IPC empty list. This issue is about the offset buffer exported by empty `ListVector` / `LargeListVector` instances having an invalid writer-index/capacity relationship. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
