ariel-miculas opened a new issue, #9655:
URL: https://github.com/apache/arrow-rs/issues/9655
**Describe the bug**
There are two separate issues:
### Bug 1 – Reader nullable union wrapping breaks decoding of plain writer
fields
When a writer produces Avro records with plain (non-nullable) field types,
but the reader schema wraps those same fields in ["null", T] unions the decoder
will misread the data. Because the writer never emits a union branch index
byte, but the decoder expects one, it falls out of sync with the byte stream.
The result is garbage field values for every record after the first that is
affected.
### Bug 2 – Skipper omits writer-only fields when the writer schema uses
named type references
When the writer schema uses Avro named type references (e.g., "type":
"Timestamp" after Timestamp has been defined once), and the reader schema
requests fewer fields than the writer wrote (either by narrowing a nested
record or omitting a field entirely), the Skipper uses the wrong field list. It
builds its skip plan from the reader's narrowed view of the type rather than
the writer's full definition. As a result, it does not consume all the bytes
the writer emitted for those fields, leaving the buffer out of sync. Every
subsequent record is then decoded from the wrong byte offset, producing
corrupted values.
Errors reported:
```
READ ERROR after 0 rows: Avro error: Parser error: offset overflow reading
avro bytes
READ ERROR after 0 rows: Avro error: EOF: Unexpected EOF reading bytes
```
**To Reproduce**
See the unit tests from https://github.com/apache/arrow-rs/pull/9605
**Expected behavior**
Correct decoding
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]