aditiwari01 opened a new pull request #2765:
URL: https://github.com/apache/hudi/pull/2765


   ## What is the purpose of the pull request
   
   For Union schema types, avro expects "null" to be first entry for null to be 
considered as default value. But since Hudi leverages 
SchemaConverters.toAvroType(...) from org.apache.spark:spark-avro_* library, 
structType to avro results in "null" being 2nd entry for UNION type schemas. 
Also, there is no default value set in this avro schema thus generated. This 
patch fixes this issue.
   
   This also fixes simple schema evolution w/ MOR tables. adding new columns 
was failing if not for this patch.
   
   For eg:
   if incoming structType is
   StructType(StructField(rowId,StringType,true), 
StructField(partitionId,StringType,true), StructField(preComb,LongType,true), 
StructField(name,StringType,true))
   
   Generated avro scheme if not for this patch:
   
{"type":"record","name":"hudi_trips_cow_record","namespace":"hoodie.hudi_trips_cow","fields":[{"name":"rowId","type":["string","null"]},{"name":"partitionId","type":["string","null"]},{"name":"preComb","type":["string","null"]},{"name":"name","type":["string","null"]}]}
   
   Note that "null" is 2nd entry in UNIONs and there is no default set.
   
   Generated avro schema with this patch:
   
{"type":"record","name":"hudi_trips_cow_record","namespace":"hoodie.hudi_trips_cow","fields":[{"name":"rowId","type":["null","string"],"default":null},{"name":"partitionId","type":["null","string"],"default":null},{"name":"preComb","type":["null","long"],"default":null},{"name":"name","type":["null","string"],"default":null},{"name":"newField","type":["null","string"],"default":null}]}
   
   Note that default value is null and not null in string format. So, this 
should work for other data types as well(not just strings). Default value of 
null is allowed only if null is the first entry in UNION for a given schema 
field.
   
   ## Brief change log
   
   Fixing struct type to avro schema conversion to fix null as first entry in 
UNION schema types and adds default values for the same.
   
   ## Verify this pull request
   
   Added unit test for convertor function and functional test for schema 
evolution in HoodieSparkSqlWriterSuite.
   
   ## Committer checklist
   
    - [ *] Has a corresponding JIRA in PR title & commit
    
    - [ *] Commit message is descriptive of the change
    
    - [ *] CI is green
   
    - [ ] Necessary doc changes done or have another open PR (Not required)
          
    - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. (Not required)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to