[GitHub] [iceberg] chenjunjiedada commented on pull request #2372: Spark: add position delete row reader

GitBox Mon, 29 Mar 2021 18:56:46 -0700


chenjunjiedada commented on pull request #2372:
URL: https://github.com/apache/iceberg/pull/2372#issuecomment-809846246



   Thanks for the review and comments!
   
   The original thought is to handle equality delete and position delete 
respectively,  which I called a different level of minor compactions. The 
separate compactions allow users to control the file scan more fine-grained, so 
as to mitigate overhead to name node. For example, users could monitor the 
number of equality deletes and position deletes from the snapshot summary and 
performs a spark or flink action to do the specific compaction.
   
   I didn't consider reading all deleted row because I thought it is major 
compaction and it may similar to the action remove all deletes. If we want to 
support one more level compaction which read all deletes and rewrite them to 
position deletes I think your suggestion definitely works.
   
   So I think it would be better to remove the logic of reading all deleted 
rows in this PR, and use the suggested way to implement it and also add an 
action for it. While I'd like to keep the current separate compaction actions 
for the fine-grained usage. Does that make sense to you?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] chenjunjiedada commented on pull request #2372: Spark: add position delete row reader

Reply via email to