This is an automated email from the ASF dual-hosted git repository.
pvary pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/iceberg.git
The following commit(s) were added to refs/heads/main by this push:
new e8cf33db7d Docs: Add note that snapshot expiration and cleanup orphan
files could corrupt Flink job state (#9002)
e8cf33db7d is described below
commit e8cf33db7d3fc637504a51a801c055dce54474b7
Author: Rui Li <[email protected]>
AuthorDate: Wed Nov 8 19:40:02 2023 +0800
Docs: Add note that snapshot expiration and cleanup orphan files could
corrupt Flink job state (#9002)
---
docs/flink-writes.md | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)
diff --git a/docs/flink-writes.md b/docs/flink-writes.md
index 641fa09e3c..e078a82868 100644
--- a/docs/flink-writes.md
+++ b/docs/flink-writes.md
@@ -270,4 +270,13 @@ INSERT INTO tableName /*+ OPTIONS('upsert-enabled'='true')
*/
...
```
-Check out all the options here:
[write-options](/flink-configuration#write-options)
\ No newline at end of file
+Check out all the options here:
[write-options](/flink-configuration#write-options)
+
+## Notes
+
+Flink streaming write jobs rely on snapshot summary to keep the last committed
checkpoint ID, and
+store uncommitted data as temporary files. Therefore, [expiring
snapshots](../tables/maintenance#expire-snapshots)
+and [deleting orphan files](../tables/maintenance#delete-orphan-files) could
possibly corrupt
+the state of the Flink job. To avoid that, make sure to keep the last snapshot
created by the Flink
+job (which can be identified by the `flink.job-id` property in the summary),
and only delete
+orphan files that are old enough.
\ No newline at end of file