prashantwason commented on a change in pull request #1457: [HUDI-741] Added
checks to validate Hoodie's schema evolution.
URL: https://github.com/apache/incubator-hudi/pull/1457#discussion_r400527391
##########
File path:
hudi-client/src/main/java/org/apache/hudi/client/HoodieWriteClient.java
##########
@@ -457,6 +461,37 @@ private void
saveWorkloadProfileMetadataToInflight(WorkloadProfile profile, Hood
private JavaRDD<WriteStatus> upsertRecordsInternal(JavaRDD<HoodieRecord<T>>
preppedRecords, String commitTime,
HoodieTable<T> hoodieTable, final boolean isUpsert) {
+ if (getConfig().getSchemaCheck()) {
+ // Ensure that the current writerSchema is compatible with the latest
schema of this
+ // dataset.
+ // When inserting/updating data, we read records using the schema saved
in the
+ // data/log files and convert them to the GenericRecords with
writerSchema.
+ // Hence, we need to ensure that this conversion can take place without
errors.
+ try {
+ SchemaUtil schemaUtil = new SchemaUtil(hoodieTable.getMetaClient());
+ MessageType savedParquetSchema = schemaUtil.getDataSchema();
+ Schema savedSchema =
schemaUtil.convertParquetSchemaToAvro(savedParquetSchema);
+ Schema writerSchema =
HoodieWriteHandle.createHoodieWriteSchema(config.getSchema());
+ if (! schemaUtil.isSchemaCompatible(savedSchema, writerSchema)) {
+ String msg = "WriterSchema is not compatible with the schema present
in the Table";
+ LOG.error(msg);
+ LOG.warn("WriterSchema: " + writerSchema);
+ LOG.warn("Table latest schema: " + savedSchema);
+ throw new HoodieUpsertException(msg);
Review comment:
Refactored.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services