umehrot2 commented on a change in pull request #1427: [HUDI-727]: Copy default
values of fields if not present when rewriting incoming record with new schema
URL: https://github.com/apache/incubator-hudi/pull/1427#discussion_r399721017
##########
File path:
hudi-common/src/main/java/org/apache/hudi/common/util/HoodieAvroUtils.java
##########
@@ -204,8 +205,13 @@ public static GenericRecord
rewriteRecordWithOnlyNewSchemaFields(GenericRecord r
private static GenericRecord rewrite(GenericRecord record, Schema
schemaWithFields, Schema newSchema) {
GenericRecord newRecord = new GenericData.Record(newSchema);
- for (Schema.Field f : schemaWithFields.getFields()) {
- newRecord.put(f.name(), record.get(f.name()));
+ //get union of both the schemas, and then populate the fields in the new
record
+ for (Schema.Field f : getAllFieldsToWrite(schemaWithFields, newSchema)) {
Review comment:
This is an internal function call that is being used by both
`rewriteRecordWithOnlyNewSchemaFields` and `rewriteRecord`.
`getAllFieldsToWrite` does not really make sense in case of
`rewriteRecordWithOnlyNewSchemaFields` and won't really do anything in that
case because old and new schema is same.
I think it would be better to refactor `rewrite` to receive
`List<Schema.Field> fieldsToWrite` as a parameter instead of
`schemaWithFields`. In case of `rewriteRecord` we can call
`getAllFieldsToWrite` and pass its value in the parameter, while in case of
`rewriteRecordWithOnlyNewSchemaFields` just pass `schema.getFields()` here.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services