ayush-san commented on issue #2900:
URL: https://github.com/apache/iceberg/issues/2900#issuecomment-895960999
I ran the following maintenance procedure on my streaming table and the
metadata size was reduced considerably. Also checkpoint for this table came
down to 700ms from earlier 8-9 mins.
```
Actions.forTable(table).rewriteDataFiles().targetSizeInBytes(256 * 1024 *
1024).execute();
spark.sql("CALL hive.system.rewrite_manifests('db_name.table_name')").show()
spark.sql("CALL hive.system.expire_snapshots(table => 'db_name.table_name',
older_than => 1628428025000, retain_last => 5)").show()
spark.sql("CALL catalog_name.system.remove_orphan_files(table =>
'db_name.table_name')").show()
```
But running expire_snapshots action leads to a bigger issue of flink job not
being able to resume from the checkpoint due to this
https://github.com/apache/iceberg/issues/2482
Error: `org.apache.iceberg.exceptions.ValidationException: Cannot determine
history between starting snapshot null and current 7571686194699158451`
@stevenzwu @rdblue Is there any plan to update expire-snapshot action
implementation?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]