bodduv opened a new pull request, #1136:
URL: https://github.com/apache/arrow-java/pull/1136

   Arrow IPC represents variable-length vectors with an offset buffer 
containing `valueCount + 1` offsets. For an empty `ListVector`, that still 
means the serialized and deserialized vector can have a non-empty offset buffer 
containing the leading zero offset. This is correct according to the Arrow 
layout, but it exposes a bug at `UnionListReader.setPosition` and other similar 
places. `UnionListReader.setPosition(0)` used offset-buffer capacity as the 
empty-vector check. That worked only when the offset buffer had zero capacity. 
After IPC, the empty vector has non-zero offset-buffer capacity, so the reader 
could throw `IndexOutOfBoundsException`. `UnionLargeListReader` has the same 
logical issue and also lacked the empty-buffer guard.
   
   ## What's Changed
   
   
   Please fill in a description of the changes here.
   
   **This contains breaking changes.**  <!-- Remove this line if there are no 
breaking changes. -->
   
   Closes #1125.
   
    `UnionListReader.setPosition(0)` can throw after IPC deserialization when
     reading a zero-row `ListVector`. The IPC path writes a leading offset for
     empty list vectors, so the offset buffer has capacity but there is no valid
     logical row at index 0. `SingleStructReaderImpl.reader(String)` still 
positions
     new child readers at the parent reader's current index, which defaults to 
0,
     so looking up an empty list child can attempt to read offsets for a row 
that
     does not exist.
   
     Update `UnionListReader` and `UnionLargeListReader` to treat
     `valueCount == 0 && index == 0` as a valid empty reader position. Other
     out-of-range indexes still throw, and valid non-empty positions defensively
     check that the offset buffer has enough capacity for both `index` and
     `index + 1` before reading offsets. `UnionMapReader` gets the same behavior
     through its `UnionListReader` base class.
   
     Add an IPC round-trip regression test covering empty `List`, `LargeList`, 
and
     `Map` children under a zero-row struct. The test verifies that child reader
     lookup does not throw, the readers report empty iteration, and invalid
     positions are still rejected.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to