danny0405 commented on issue #12523: URL: https://github.com/apache/hudi/issues/12523#issuecomment-2556230080
Thanks for the feedback @xiearthur , it looks like you are using the FLINK_STATE index where the checkpoint is required for storing the index items. Once the index got removed, an UPSERT message could be mistakenly deemed as INSERT so duplication occurs. We have another index type `BUCKET` which does not have this index storage, it utilities a hashing algorithm for mapping the records to fixed number of buckets, maybe you should try this one. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
