ad1happy2go commented on issue #10781:
URL: https://github.com/apache/hudi/issues/10781#issuecomment-1971101369

   Yes thats correct, You should remove dups after insterting using
   bull_insert or not use bulk insert at all in this case.
   
   On Thu, Feb 29, 2024 at 4:04 PM chenbodeng719 ***@***.***>
   wrote:
   
   > Did you only used 0.14.1 only or is this the upgraded table from previous
   > version? can you provide values for hudi meta columns also?
   >
   > bulk_insert itself can ingest duplicates. Did you got duplicates after
   > bulk_insert itself. Yes if that's the case, upsert is going to update both
   > records. Did you confirmed if you had these duplicates after bulk_insert?
   >
   > Running bulk_insert twice on same data also can cause this issue.
   >
   > "if that's the case, upsert is going to update both records. " I guess
   > it's my case. First, bulk insert brings some duplicate key into the table.
   > Then when the upsert with duplicate key comes, it updates the duplicate
   > rows with same key. In my case, two rows for one dup key has been changed.
   > I wonder if there are five rows for one dup key, it updates the five rows?
   >
   > —
   > Reply to this email directly, view it on GitHub
   > <https://github.com/apache/hudi/issues/10781#issuecomment-1970850132>, or
   > unsubscribe
   > 
<https://github.com/notifications/unsubscribe-auth/APD55YQZYWWO3TQ7UAOZBPTYV4B4ZAVCNFSM6AAAAABD7P3VEOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNZQHA2TAMJTGI>
   > .
   > You are receiving this because you commented.Message ID:
   > ***@***.***>
   >
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to