eric9204 commented on issue #6966: URL: https://github.com/apache/hudi/issues/6966#issuecomment-1311163178
@fengjian428 when spark-sql was used to write data to hudi, the deltacommit action and compaction action were performed one by one, therefore, they will not influence each other. But structured streaming is not, the compaction service and writing processes share the same `HoodieWriteConfig`. `hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/compact/RunCompactionActionExecutor.java` line 87 `HoodieWriteConfig configCopy = config;` the `configCopy` used by compaction service just points to the original configuration. If set hoodie.datasource.write.drop.partition.columns= true, the value of `hoodie.avro.schema` in the `config` removed partition fields by write process, The compaction service will reset value of `hoodie.avro.schema` (`hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/compact/RunCompactionActionExecutor.java` line 94 `configCopy.setSchema(schemaPair.getRight(). get()); `). So, the value of `hoodie.avro.schema` used by write process was changed too. after the first successful compaction, the changed avro schema and records which was removed partition fields by write process are inconsistent, Therefore, this problem arises. Therefore, another way to solve the problem is deep copy config,instead of just pointing to the original configuration.(`hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/compact/RunCompactionActionExecutor.java` line 87 `HoodieWriteConfig configCopy = config;`) Or adopt this solution. #7167,This may be a little simpler. any suggestion ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
