coolderli opened a new pull request #3990:
URL: https://github.com/apache/iceberg/pull/3990


   In 
https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/ManifestFilterManager.java#L137,
 we use the min sequence number to filter the delete-manifests, then we can 
find the older delete file and drop them.
   
   But if we have some cold partition that never compaction again, then the min 
sequence number will never change. That makes the older delete file in other 
partitions will never be dropped. This will be worse and worse. It will make 
the spark driver oom and finally, the table becomes unavailable.
   
   In this PR, I use a map that contains a min sequence number on each 
partition instead of the global min sequence number. But I found the code 
changed a lot. I am not sure it's a good solution.
   
   And this PR can solve another problem. Currently, when we enabled partial 
compaction, we didn't drop the delete files as well. Because we can not make 
sure the delete files are not referenced by other data files when commit 
replace. That will leave a lot of useless delete files.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to