bodduv opened a new pull request, #1136:
URL: https://github.com/apache/arrow-java/pull/1136
Arrow IPC represents variable-length vectors with an offset buffer
containing `valueCount + 1` offsets. For an empty `ListVector`, that still
means the serialized and deserialized vector can have a non-empty offset buffer
containing the leading zero offset. This is correct according to the Arrow
layout, but it exposes a bug at `UnionListReader.setPosition` and other similar
places. `UnionListReader.setPosition(0)` used offset-buffer capacity as the
empty-vector check. That worked only when the offset buffer had zero capacity.
After IPC, the empty vector has non-zero offset-buffer capacity, so the reader
could throw `IndexOutOfBoundsException`. `UnionLargeListReader` has the same
logical issue and also lacked the empty-buffer guard.
## What's Changed
Please fill in a description of the changes here.
**This contains breaking changes.** <!-- Remove this line if there are no
breaking changes. -->
Closes #1125.
`UnionListReader.setPosition(0)` can throw after IPC deserialization when
reading a zero-row `ListVector`. The IPC path writes a leading offset for
empty list vectors, so the offset buffer has capacity but there is no valid
logical row at index 0. `SingleStructReaderImpl.reader(String)` still
positions
new child readers at the parent reader's current index, which defaults to
0,
so looking up an empty list child can attempt to read offsets for a row
that
does not exist.
Update `UnionListReader` and `UnionLargeListReader` to treat
`valueCount == 0 && index == 0` as a valid empty reader position. Other
out-of-range indexes still throw, and valid non-empty positions defensively
check that the offset buffer has enough capacity for both `index` and
`index + 1` before reading offsets. `UnionMapReader` gets the same behavior
through its `UnionListReader` base class.
Add an IPC round-trip regression test covering empty `List`, `LargeList`,
and
`Map` children under a zero-row struct. The test verifies that child reader
lookup does not throw, the readers report empty iteration, and invalid
positions are still rejected.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]