aditiwari01 opened a new pull request #2765:
URL: https://github.com/apache/hudi/pull/2765
## What is the purpose of the pull request
For Union schema types, avro expects "null" to be first entry for null to be
considered as default value. But since Hudi leverages
SchemaConverters.toAvroType(...) from org.apache.spark:spark-avro_* library,
structType to avro results in "null" being 2nd entry for UNION type schemas.
Also, there is no default value set in this avro schema thus generated. This
patch fixes this issue.
This also fixes simple schema evolution w/ MOR tables. adding new columns
was failing if not for this patch.
For eg:
if incoming structType is
StructType(StructField(rowId,StringType,true),
StructField(partitionId,StringType,true), StructField(preComb,LongType,true),
StructField(name,StringType,true))
Generated avro scheme if not for this patch:
{"type":"record","name":"hudi_trips_cow_record","namespace":"hoodie.hudi_trips_cow","fields":[{"name":"rowId","type":["string","null"]},{"name":"partitionId","type":["string","null"]},{"name":"preComb","type":["string","null"]},{"name":"name","type":["string","null"]}]}
Note that "null" is 2nd entry in UNIONs and there is no default set.
Generated avro schema with this patch:
{"type":"record","name":"hudi_trips_cow_record","namespace":"hoodie.hudi_trips_cow","fields":[{"name":"rowId","type":["null","string"],"default":null},{"name":"partitionId","type":["null","string"],"default":null},{"name":"preComb","type":["null","long"],"default":null},{"name":"name","type":["null","string"],"default":null},{"name":"newField","type":["null","string"],"default":null}]}
Note that default value is null and not null in string format. So, this
should work for other data types as well(not just strings). Default value of
null is allowed only if null is the first entry in UNION for a given schema
field.
## Brief change log
Fixing struct type to avro schema conversion to fix null as first entry in
UNION schema types and adds default values for the same.
## Verify this pull request
Added unit test for convertor function and functional test for schema
evolution in HoodieSparkSqlWriterSuite.
## Committer checklist
- [ *] Has a corresponding JIRA in PR title & commit
- [ *] Commit message is descriptive of the change
- [ *] CI is green
- [ ] Necessary doc changes done or have another open PR (Not required)
- [ ] For large changes, please consider breaking it into sub-tasks under
an umbrella JIRA. (Not required)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]