[GitHub] [iceberg] hameizi opened a new pull request #3834: change delete logic

GitBox Thu, 30 Dec 2021 23:39:35 -0800


hameizi opened a new pull request #3834:
URL: https://github.com/apache/iceberg/pull/3834



   There is two logic change:
   1.Previous delete logic is write all delete data in eq-delete file although 
there is same key in pos-delete file. This PR change this logic to just write 
the delete data what is not exist in pos-delete file in eq-delete file.
   2.Previous write logic in flink will write data in pos-delete file where 
there is same key in data-file, but this logic can only guarantee  uniqueness 
in current txn but not all table. And i think the writer just should guarantee 
the correctness when user's semantic is correct. So this PR delete this logic 
in write function.
   the following is difference between old delete logic and the new.
   table schema:
   int key; (primary key)
   string data;
   old logic:
   txn1:
   > insert (1,'aa'); -->pos-delete file has (1,filepath)
   txn2:
   >delete (1,'aa');  -->eq-delete file add (1,'aa')
   > insert(1,'bb');  -->pos-delete file add(1,filepath)
   >delete (1,'bb');  -->eq-delete file add (1,'bb') pos-delete file 
add(1,filepath)
   result:
   eq-delete file has (1,'aa'),(1,'bb')
   pos-delete file has (1,filepath),(1,filepath)
   
   new logic:
   txn1:
   > insert (1,'aa'); 
   txn2:
   >delete (1,'aa');  -->eq-delete file add (1,'aa')
   > insert(1,'bb'); 
   >delete (1,'bb');  --> pos-delete file add(1,filepath)
   >
   result:
   eq-delete file has (1,'aa')
   pos-delete file has (1,filepath)
   
   Actually the data (1,'bb') is unnecessary in eq-delete file, because when we 
call function applyPosdelete that (1,'bb') will be delete from result so there 
is not data match (1,'bb') when we call applyEqdelete.  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] hameizi opened a new pull request #3834: change delete logic

Reply via email to