stevenzwu commented on code in PR #5873:
URL: https://github.com/apache/iceberg/pull/5873#discussion_r982638333
##########
flink/v1.13/flink/src/main/java/org/apache/iceberg/flink/sink/IcebergFilesCommitter.java:
##########
@@ -149,7 +149,18 @@ public void initializeState(StateInitializationContext
context) throws Exception
this.checkpointsState =
context.getOperatorStateStore().getListState(STATE_DESCRIPTOR);
this.jobIdState =
context.getOperatorStateStore().getListState(JOB_ID_DESCRIPTOR);
if (context.isRestored()) {
- String restoredFlinkJobId = jobIdState.get().iterator().next();
+ Iterable<String> jobIdIterable = jobIdState.get();
+ if (jobIdIterable == null || !jobIdIterable.iterator().hasNext()) {
+ LOG.warn(
+ "Failed to restore committer state. This can happen when operator
uid changed and Flink "
+ + "allowNonRestoredState is enabled. Best practice is to
explicitly set the operator id "
+ + "via FlinkSink#Builder#uidPrefix() so that the committer
operator uid is stable. "
+ + "Otherwise, Flink auto generate an operator uid based on job
topology."
+ + "With that, operator uid is subjective to change upon
topology change.");
+ return;
Review Comment:
normally, when we talk about corrupted Flink checkpoint or savepoint, it is
corrupted checkpoint metadata file or RocksDB SST files. those shouldn't be
affected by this change, as they typically shown up as parsing errors.
operator uid change with would cause the issue that this PR tries to fix.
##########
flink/v1.13/flink/src/main/java/org/apache/iceberg/flink/sink/IcebergFilesCommitter.java:
##########
@@ -149,7 +149,18 @@ public void initializeState(StateInitializationContext
context) throws Exception
this.checkpointsState =
context.getOperatorStateStore().getListState(STATE_DESCRIPTOR);
this.jobIdState =
context.getOperatorStateStore().getListState(JOB_ID_DESCRIPTOR);
if (context.isRestored()) {
- String restoredFlinkJobId = jobIdState.get().iterator().next();
+ Iterable<String> jobIdIterable = jobIdState.get();
+ if (jobIdIterable == null || !jobIdIterable.iterator().hasNext()) {
+ LOG.warn(
+ "Failed to restore committer state. This can happen when operator
uid changed and Flink "
+ + "allowNonRestoredState is enabled. Best practice is to
explicitly set the operator id "
+ + "via FlinkSink#Builder#uidPrefix() so that the committer
operator uid is stable. "
+ + "Otherwise, Flink auto generate an operator uid based on job
topology."
+ + "With that, operator uid is subjective to change upon
topology change.");
+ return;
Review Comment:
normally, when we talk about corrupted Flink checkpoint or savepoint, it is
corrupted checkpoint metadata file or RocksDB SST files. those shouldn't be
affected by this change, as they typically shown up as parsing errors.
operator uid change with allowNonRestoredState would cause the issue that
this PR tries to fix.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]