hameizi opened a new pull request #3834: URL: https://github.com/apache/iceberg/pull/3834
There is two logic change: 1.Previous delete logic is write all delete data in eq-delete file although there is same key in pos-delete file. This PR change this logic to just write the delete data what is not exist in pos-delete file in eq-delete file. 2.Previous write logic in flink will write data in pos-delete file where there is same key in data-file, but this logic can only guarantee uniqueness in current txn but not all table. And i think the writer just should guarantee the correctness when user's semantic is correct. So this PR delete this logic in write function. the following is difference between old delete logic and the new. table schema: int key; (primary key) string data; old logic: txn1: > insert (1,'aa'); -->pos-delete file has (1,filepath) txn2: >delete (1,'aa'); -->eq-delete file add (1,'aa') > insert(1,'bb'); -->pos-delete file add(1,filepath) >delete (1,'bb'); -->eq-delete file add (1,'bb') pos-delete file add(1,filepath) result: eq-delete file has (1,'aa'),(1,'bb') pos-delete file has (1,filepath),(1,filepath) new logic: txn1: > insert (1,'aa'); txn2: >delete (1,'aa'); -->eq-delete file add (1,'aa') > insert(1,'bb'); >delete (1,'bb'); --> pos-delete file add(1,filepath) > result: eq-delete file has (1,'aa') pos-delete file has (1,filepath) Actually the data (1,'bb') is unnecessary in eq-delete file, because when we call function applyPosdelete that (1,'bb') will be delete from result so there is not data match (1,'bb') when we call applyEqdelete. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
