nsivabalan opened a new pull request, #18892:
URL: https://github.com/apache/hudi/pull/18892

   ### Change Logs
   
   `KafkaAvroSchemaDeserializer` previously only overrode `deserialize(String, 
Boolean, byte[], Schema)` to inject the configured `sourceSchema`. The Kafka 
consumer / Connect framework can invoke other overloads — `deserialize(String, 
byte[])`, `deserialize(String, byte[], Schema)`, and `deserialize(String, 
Headers, byte[])` — which bypassed the `sourceSchema` injection. This caused 
`ArrayIndexOutOfBoundsException` when consuming records serialized with an 
older schema (fewer fields in a nested record) while the deserializer was 
configured with an evolved schema, because Avro resolution used the writer's 
old schema instead of the configured reader schema.
   
   This change overrides all three additional `deserialize` methods to 
consistently inject `sourceSchema`, ensuring Avro schema resolution handles old 
→ new field evolution correctly (defaulting new nullable fields to null).
   
   ### Impact
   
   Bug fix for `KafkaAvroSchemaDeserializer`. No public API change. Behavior 
for the already-overridden `deserialize(String, Boolean, byte[], Schema)` is 
unchanged. The three newly overridden methods now behave consistently with the 
existing override.
   
   ### Risk level
   
   low
   
   ### Documentation Update
   
   No user-facing config or API change.
   
   ### Contributor's checklist
   
   - [x] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [x] Change Logs and Impact were stated clearly
   - [x] Adequate tests were added
   
   ### Test Plan
   
   - New tests in `TestKafkaAvroSchemaDeserializer` use a Debezium CDC envelope 
schema with a nested `Value` record that gains 4 nullable fields (`notes`, 
`search_engine_id`, `locale_id`, `language_id`). All 4 deserialize overloads 
are exercised against old-schema records read with the evolved schema, 
validating positional index access (index 20-23) on the nested `before` record 
to reproduce the AIOOBE.
   - `mvn -pl hudi-utilities test -Dtest='TestKafkaAvroSchemaDeserializer'` — 
2/2 pass.
   - `mvn -pl hudi-utilities checkstyle:check` — 0 violations.
   
   Closes #18891


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to