sathyaprakashg opened a new pull request #2012:
URL: https://github.com/apache/hudi/pull/2012


   ## What is the purpose of the pull request
   
   When schema is evolved but producer is still producing events using older 
version of schema, Hudi delta streamer is failing. This fix is to make sure 
delta streamer works fine with schema evoluation.
   
   Related issues #1845 #1971 #1972 
   
   ## Brief change log
   
     - Update avro to spark conversion method 
`AvroConversionHelper.createConverterToRow` to handle scenario when provided 
schema has more fields than data (scenario where producer is still sending 
events with old schema)
    -  Introduce new schema provider class called `SchemaBasedSchemaProvider`. 
This is used to set schema based on schema of the data. Currently, 
`HoodieAvroUtils.avroToBytes` uses the schema of the data to convert to bytes, 
but `HoodieAvroUtils.bytesToAvro` uses provided schema. Since both may not 
match always, it results in error. By using data's schema using new schema 
provider, we can ensure, same schema is used for converting avro to bytes and 
bytes back to avro.
   
   
   ## Verify this pull request
   
   This change added tests and can be verified as follows:
   
     - *Added unit test to verify schema evoluation* Thanks @sbernauer for unit 
test
   
   ## Committer checklist
   
    - [x] Has a corresponding JIRA in PR title & commit
    
    - [x] Commit message is descriptive of the change
    
    - [x] CI is green
   
    - [ ] Necessary doc changes done or have another open PR
          
    - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to