ariel-miculas opened a new issue, #9655:
URL: https://github.com/apache/arrow-rs/issues/9655

   **Describe the bug**
   There are two separate issues:
   ### Bug 1 – Reader nullable union wrapping breaks decoding of plain writer 
fields
   
   When a writer produces Avro records with plain (non-nullable) field types, 
but the reader schema wraps those same fields in ["null", T] unions the decoder 
will misread the data. Because the writer never emits a union branch index 
byte, but the decoder expects one, it falls out of sync with the byte stream. 
The result is garbage field values for every record after the first that is 
affected.
   
   ### Bug 2 – Skipper omits writer-only fields when the writer schema uses 
named type references
   
     When the writer schema uses Avro named type references (e.g., "type": 
"Timestamp" after Timestamp has been defined once), and the reader schema 
requests fewer fields than the writer wrote (either by narrowing a nested 
record or omitting a field entirely), the Skipper uses the wrong field list. It 
builds its skip plan from the reader's narrowed view of the type rather than 
the writer's full definition. As a result, it does not consume all the bytes 
the writer emitted for those fields, leaving the buffer out of sync. Every 
subsequent record is then decoded from the wrong byte offset, producing 
corrupted values.
   
   Errors reported:
   ```
   READ ERROR after 0 rows: Avro error: Parser error: offset overflow reading 
avro bytes
   READ ERROR after 0 rows: Avro error: EOF: Unexpected EOF reading bytes
   ```
   
   
   **To Reproduce**
   See the unit tests from https://github.com/apache/arrow-rs/pull/9605
   
   **Expected behavior**
   Correct decoding


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to