[GitHub] [hudi] n3nash edited a comment on pull request #2012: [HUDI-1129] Deltastreamer Add support for schema evolution

GitBox Sat, 08 May 2021 12:45:38 -0700


n3nash edited a comment on pull request #2012:
URL: https://github.com/apache/hudi/pull/2012#issuecomment-835486764



   At a high level, my understanding is there are 3 issues
   
   1. Removing namespace could cause backwards compatibility -> Validated with 
[this](https://github.com/apache/hudi/pull/2928) PR that it should be OK to 
make this change. 
   2. Records with older schema received without a user defined schema. This 
breaks the `getInsertRecord` API since the writer schema is smaller than the 
reader schema and bytes to generic record conversion breaks, addressed by 
@nsivabalan [diff](https://github.com/apache/hudi/pull/2927)
   3. For general schema evolution to work when converting from DataFrame to 
Avro, we need to ensure that new elements added in the schema have a default 
value "null" using the UNION type as follows `name: a, type ["null", <type>]`. 
Spark's internal converter breaks the Avro spec of creating a UNION by 
reversing the order of null as follows `name: a, type [<type>, "null"]` , 
@nsivabalan fixed this with [this](https://github.com/apache/hudi/pull/2911) PR 
by relying on Hudi's internal ConversionHelper to ensure that "null" is 
appended first. Also, [this](https://github.com/apache/hudi/pull/2765) PR 
addressed some other issues as well. 
   
   With all of these 3 fixes, we should be able to land this PR. 
@sathyaprakashg Let us know if there were any other concerns that you came 
across. 
   
   @nsivabalan I can confirm that (3) should be good to land. I've approved the 
diff and @bvaradar will be taking a look at this later today, once he confirms, 
we can land both (3) and then (2) with the assumption that (1) validates both. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] n3nash edited a comment on pull request #2012: [HUDI-1129] Deltastreamer Add support for schema evolution

Reply via email to