chenbodeng719 commented on issue #10781: URL: https://github.com/apache/hudi/issues/10781#issuecomment-1970850132
> Did you only used 0.14.1 only or is this the upgraded table from previous version? can you provide values for hudi meta columns also? > > bulk_insert itself can ingest duplicates. Did you got duplicates after bulk_insert itself. Yes if that's the case, upsert is going to update both records. Did you confirmed if you had these duplicates after bulk_insert? > > Running bulk_insert twice on same data also can cause this issue. "if that's the case, upsert is going to update both records. " I guess it's my case. First, bulk insert brings some duplicate key into the table. Then when the upsert with duplicate key comes, it updates the duplicate rows with same key. In my case, two rows for one dup key has been changed. I wonder if there are five rows for one dup key, it updates the five rows? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
