[GitHub] [iceberg] stevenzwu commented on a diff in pull request #5873: Flink: add defensive check in IcebergFilesCommitter for restoring sta…

GitBox Wed, 28 Sep 2022 09:47:35 -0700


stevenzwu commented on code in PR #5873:
URL: https://github.com/apache/iceberg/pull/5873#discussion_r982638333



##########
flink/v1.13/flink/src/main/java/org/apache/iceberg/flink/sink/IcebergFilesCommitter.java:
##########
@@ -149,7 +149,18 @@ public void initializeState(StateInitializationContext 
context) throws Exception
     this.checkpointsState = 
context.getOperatorStateStore().getListState(STATE_DESCRIPTOR);
     this.jobIdState = 
context.getOperatorStateStore().getListState(JOB_ID_DESCRIPTOR);
     if (context.isRestored()) {
-      String restoredFlinkJobId = jobIdState.get().iterator().next();
+      Iterable<String> jobIdIterable = jobIdState.get();
+      if (jobIdIterable == null || !jobIdIterable.iterator().hasNext()) {
+        LOG.warn(
+            "Failed to restore committer state. This can happen when operator 
uid changed and Flink "
+                + "allowNonRestoredState is enabled. Best practice is to 
explicitly set the operator id "
+                + "via FlinkSink#Builder#uidPrefix() so that the committer 
operator uid is stable. "
+                + "Otherwise, Flink auto generate an operator uid based on job 
topology."
+                + "With that, operator uid is subjective to change upon 
topology change.");
+        return;

Review Comment:
   normally, when we talk about corrupted Flink checkpoint or savepoint, it is 
corrupted checkpoint metadata file or RocksDB SST files. those shouldn't be 
affected by this change, as they typically shown up as parsing errors.
   
   operator uid change with would cause the issue that this PR tries to fix.



##########
flink/v1.13/flink/src/main/java/org/apache/iceberg/flink/sink/IcebergFilesCommitter.java:
##########
@@ -149,7 +149,18 @@ public void initializeState(StateInitializationContext 
context) throws Exception
     this.checkpointsState = 
context.getOperatorStateStore().getListState(STATE_DESCRIPTOR);
     this.jobIdState = 
context.getOperatorStateStore().getListState(JOB_ID_DESCRIPTOR);
     if (context.isRestored()) {
-      String restoredFlinkJobId = jobIdState.get().iterator().next();
+      Iterable<String> jobIdIterable = jobIdState.get();
+      if (jobIdIterable == null || !jobIdIterable.iterator().hasNext()) {
+        LOG.warn(
+            "Failed to restore committer state. This can happen when operator 
uid changed and Flink "
+                + "allowNonRestoredState is enabled. Best practice is to 
explicitly set the operator id "
+                + "via FlinkSink#Builder#uidPrefix() so that the committer 
operator uid is stable. "
+                + "Otherwise, Flink auto generate an operator uid based on job 
topology."
+                + "With that, operator uid is subjective to change upon 
topology change.");
+        return;

Review Comment:
   normally, when we talk about corrupted Flink checkpoint or savepoint, it is 
corrupted checkpoint metadata file or RocksDB SST files. those shouldn't be 
affected by this change, as they typically shown up as parsing errors.
   
   operator uid change with allowNonRestoredState would cause the issue that 
this PR tries to fix.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] stevenzwu commented on a diff in pull request #5873: Flink: add defensive check in IcebergFilesCommitter for restoring sta…

Reply via email to