sathyaprakashg opened a new pull request #2012:
URL: https://github.com/apache/hudi/pull/2012


   ## What is the purpose of the pull request
   
   When schema is evolved but producer is still producing events using older 
version of schema, Hudi delta streamer is failing. This fix is to make sure 
delta streamer works fine with schema evoluation.
   
   Related issues #1845 #1971 #1972 
   
   ## Brief change log
   
     - Update avro to spark conversion method 
`AvroConversionHelper.createConverterToRow` to handle scenario when provided 
schema has more fields than data (scenario where producer is still sending 
events with old schema)
    -  Introduce new payload class called `BaseAvroPayloadWithSchema`. This is 
used to store the writer schema part of payload. Currently, 
`HoodieAvroUtils.avroToBytes` uses the schema of the data to convert to bytes, 
but `HoodieAvroUtils.bytesToAvro` uses provided schema. Since both may not 
match always, it results in error. By having data's schema as part of payload, 
we can ensure, same schema is used for converting avro to bytes and bytes back 
to avro. 
   
   
   ## Verify this pull request
   
   This change added tests and can be verified as follows:
   
     - *Added unit test to verify schema evoluation* Thanks @sbernauer for unit 
test
   
   ## Committer checklist
   
    - [x] Has a corresponding JIRA in PR title & commit
    
    - [x] Commit message is descriptive of the change
    
    - [x] CI is green
   
    - [ ] Necessary doc changes done or have another open PR
          
    - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to