[GitHub] [hudi] eric9204 commented on issue #6966: [SUPPORT]HoodieWriteHandle: Error writing record HoodieRecord{key=HoodieKey { recordKey=id308723 partitionPath=202210141643}, currentLocation='null', newLocation='null'}

GitBox Thu, 10 Nov 2022 18:47:25 -0800


eric9204 commented on issue #6966:
URL: https://github.com/apache/hudi/issues/6966#issuecomment-1311163178

@fengjian428 when spark-sql was used to write data to hudi, the deltacommit
action and compaction action were performed one by one, therefore, they will
not influence each other. But structured streaming is not, the compaction
service and writing processes share the same `HoodieWriteConfig`.

`hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/compact/RunCompactionActionExecutor.java`
line 87 `HoodieWriteConfig configCopy = config;`

the `configCopy` used by compaction service just points to the original
configuration.

If set hoodie.datasource.write.drop.partition.columns= true, the value of
`hoodie.avro.schema` in the `config` removed partition fields by write process,

The compaction service will reset value of `hoodie.avro.schema`
(`hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/compact/RunCompactionActionExecutor.java`
line 94 `configCopy.setSchema(schemaPair.getRight(). get()); `).

So, the value of `hoodie.avro.schema` used by write process was changed too.

after the first successful compaction, the changed avro schema and records
which was removed partition fields by write process are inconsistent,
Therefore, this problem arises.

Therefore, another way to solve the problem is deep copy config,instead of
just pointing to the original
configuration.（`hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/compact/RunCompactionActionExecutor.java`
line 87 `HoodieWriteConfig configCopy = config;`）

Or adopt this solution. #7167，This may be a little simpler.

any suggestion ?

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] eric9204 commented on issue #6966: [SUPPORT]HoodieWriteHandle: Error writing record HoodieRecord{key=HoodieKey { recordKey=id308723 partitionPath=202210141643}, currentLocation='null', newLocation='null'}

Reply via email to