danny0405 commented on code in PR #7895:
URL: https://github.com/apache/hudi/pull/7895#discussion_r1101236071
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/HoodieTable.java:
##########
@@ -799,27 +803,38 @@ public TaskContextSupplier getTaskContextSupplier() {
* GenericRecords with writerSchema. Hence, we need to ensure that this
conversion can take place without errors.
*/
private void validateSchema() throws HoodieUpsertException,
HoodieInsertException {
-
- if (!shouldValidateAvroSchema() ||
getActiveTimeline().getCommitsTimeline().filterCompletedInstants().empty()) {
+ boolean allowProjection = config.shouldAllowAutoEvolutionColumnDrop();
+ boolean shouldValidate = shouldValidateAvroSchema();
+ if ((allowProjection && !shouldValidate)
+ ||
getActiveTimeline().getCommitsTimeline().filterCompletedInstants().empty()) {
// Check not required
return;
}
Schema tableSchema;
Schema writerSchema;
- boolean isValid;
+ String errorMessage = null;
try {
TableSchemaResolver schemaResolver = new
TableSchemaResolver(getMetaClient());
writerSchema =
HoodieAvroUtils.createHoodieWriteSchema(config.getSchema());
- tableSchema =
HoodieAvroUtils.createHoodieWriteSchema(schemaResolver.getTableAvroSchemaWithoutMetadataFields());
- isValid = isSchemaCompatible(tableSchema, writerSchema,
config.shouldAllowAutoEvolutionColumnDrop());
+ tableSchema =
HoodieAvroUtils.createHoodieWriteSchema(schemaResolver.getTableAvroSchema(false));
+ if (!allowProjection && !AvroSchemaUtils.canProject(tableSchema,
writerSchema)) {
+ errorMessage = String.format("Column dropping is not allowed. Use %s
to disable this check", SCHEMA_ALLOW_AUTO_EVOLUTION_COLUMN_DROP.key());
+ } else if (shouldValidate && !isSchemaCompatible(tableSchema,
writerSchema)) {
Review Comment:
So you mean to move the schema check to the `HoodieTable`, do we still need
the validation in the original `HoodieSparkSqlWriter` or should we put these
validations in the `HoodieTableSink` ?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]