beyond1920 commented on issue #10407: URL: https://github.com/apache/hudi/issues/10407#issuecomment-1875017243
@zyclove Guess the previous writer jobs used simple bucket index, and the latest writer jobs did not. It leads to data deduplication, because records with same primary key value are written into different file groups. And the reason of exception in your last message is there already existed file groups which do not adhere to the simple bucket index rules. The first 8 characters are not bucket id. Could you show the old file groups before you upgrade, are those file groups contains 8 characters which represents the bucket id as the prefix? Or show the logs of old writer jobs. It could help us verify whether our previous guess is correct or not. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
