jackye1995 commented on pull request #2591: URL: https://github.com/apache/iceberg/pull/2591#issuecomment-845657683
> We talked about this previously as a possible post-merge, post-delete, post-rewrite sort of thing. Cool, that `cleanUnreferencedDeleteFiles()` was just a divergent thought, great that we already thought about it. > If file C for example is the correct size, and we never need to rewrite it, we never clean up those deletes so we still have to make another sort of action to clean up those files. Yes, that goes back to what I was thinking before, if we can have an option to force check the delete file and avoid filtering it out of the rewrite, then it should work. But I think I am starting to see where you are coming from. If this is done as a different action then we can save the write time if the file read does not contain any rows to delete in the delete file. To enable such a check in Spark, it cannot use the same code path that fully read all the rows and write it back. So it probably does not make sense to add delete functionality from that perspective. Thanks for the clarification! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
