trushev commented on code in PR #7895:
URL: https://github.com/apache/hudi/pull/7895#discussion_r1101343426
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/HoodieTable.java:
##########
@@ -799,27 +803,38 @@ public TaskContextSupplier getTaskContextSupplier() {
* GenericRecords with writerSchema. Hence, we need to ensure that this
conversion can take place without errors.
*/
private void validateSchema() throws HoodieUpsertException,
HoodieInsertException {
-
- if (!shouldValidateAvroSchema() ||
getActiveTimeline().getCommitsTimeline().filterCompletedInstants().empty()) {
+ boolean allowProjection = config.shouldAllowAutoEvolutionColumnDrop();
+ boolean shouldValidate = shouldValidateAvroSchema();
+ if ((allowProjection && !shouldValidate)
+ ||
getActiveTimeline().getCommitsTimeline().filterCompletedInstants().empty()) {
// Check not required
return;
}
Schema tableSchema;
Schema writerSchema;
- boolean isValid;
+ String errorMessage = null;
try {
TableSchemaResolver schemaResolver = new
TableSchemaResolver(getMetaClient());
writerSchema =
HoodieAvroUtils.createHoodieWriteSchema(config.getSchema());
- tableSchema =
HoodieAvroUtils.createHoodieWriteSchema(schemaResolver.getTableAvroSchemaWithoutMetadataFields());
- isValid = isSchemaCompatible(tableSchema, writerSchema,
config.shouldAllowAutoEvolutionColumnDrop());
+ tableSchema =
HoodieAvroUtils.createHoodieWriteSchema(schemaResolver.getTableAvroSchema(false));
+ if (!allowProjection && !AvroSchemaUtils.canProject(tableSchema,
writerSchema)) {
+ errorMessage = String.format("Column dropping is not allowed. Use %s
to disable this check", SCHEMA_ALLOW_AUTO_EVOLUTION_COLUMN_DROP.key());
+ } else if (shouldValidate && !isSchemaCompatible(tableSchema,
writerSchema)) {
Review Comment:
I must admit that I underestimated this task:)
1) Changed `HoodieTable.validate` leads to 9 failed tests in hudi-spark with
```SchemaCompatibilityException: Column dropping is not allowed```. I'm not
sure if changed validation is buggy or spark writer. I need more time to look
around what's going on
2) There are conflicting options:
`hoodie.datasource.write.drop.partition.columns=true` -- allows column
dropping for partition columns. Current implementation(before this PR) skips
column dropping check еven though
`hoodie.datasource.write.schema.allow.auto.evolution.column.drop=false`
It looks like we should introduce new method
`canProjectExceptPartCols(precSchema, newSchema, partCols)`
Mb you are right that `HoodieTableSink` is more appropriate place for these
changes to avoid affecting spark. Or mb it is a workaround of
`HoodieTable.validate` problem instead of its solution. I'll try to find it out
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]