beyond1920 commented on issue #10407:
URL: https://github.com/apache/hudi/issues/10407#issuecomment-1875017243

   @zyclove 
   Guess the previous writer jobs used simple bucket index, and the latest 
writer jobs did not. 
   It leads to data deduplication, because records with same primary key value 
are written into different file groups. 
   And the reason of exception in your last message is there already existed 
file groups which do not adhere to the simple bucket index rules. The first 8 
characters are not bucket id.
   Could you show the old file groups before you upgrade, are those file groups 
contains 8 characters which represents the bucket id as the prefix?  Or show 
the logs of old writer jobs. It could help us verify whether our previous guess 
is correct or not.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to