rdblue commented on issue #2900: URL: https://github.com/apache/iceberg/issues/2900#issuecomment-896121910
@ayush-san, the problem is that expiring snapshots dropped snapshots that were needed to reconstruct the history of your table between the last checkpoint and the current state. Because Flink can't determine what changed between those two, it throws an error. There isn't anything wrong with the table or the snapshot expiration, it is just that you expired snapshots that you ended up needing to use. To avoid the problem, I recommend keeping snapshots around for a longer period of time so that you don't expire snapshots that haven't been processed by your job yet. It looks like you'd also benefit from compaction in the streaming job, which we've been discussing elsewhere. There have been a few ideas around this, but the one I like the best is a second set of writers that write larger files across checkpoints once the content has already been committed. When those files get large enough, they swap the small files for a large one. Getting back to the purpose of this issue, it looks like the problem was probably that there was too much metadata for each table. Regular maintenance is probably the right fix. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
