openinx commented on a change in pull request #2745:
URL: https://github.com/apache/iceberg/pull/2745#discussion_r659745312



##########
File path: flink/src/main/java/org/apache/iceberg/flink/sink/FlinkSink.java
##########
@@ -205,6 +207,30 @@ public Builder equalityFieldColumns(List<String> columns) {
       return this;
     }
 
+    /**
+     * Set the uid prefix for FlinkSink operators. Note that FlinkSink 
internally consists of multiple operators (like
+     * writer, committer, dummy sink etc.) Actually operator uid will be 
appended with a suffix like "uid-writer".
+     * <p>
+     * Flink auto generates operator uids if not set explicitly. It is a 
recommended
+     * <a 
href="https://ci.apache.org/projects/flink/flink-docs-master/docs/ops/production_ready/";>
+     * best-practice to set uid for all operators</a> before deploying to 
production. Flink has an option to {@code
+     * pipeline.auto-generate-uids=false} to disable auto-generation and force 
explicit setting of all operator uids.
+     * <p>
+     * Be careful with setting this for an existing job, because now we are 
changing the opeartor uid from an
+     * auto-generated one to this new value. When deploying the change with a 
checkpoint, Flink won't be able to restore
+     * the previous Flink sink operator state (more specifically the committer 
operator state). You need to use {@code
+     * --allowNonRestoredState} to ignore the previous sink state. During 
restore Flink sink state is used to check if
+     * checkpointed files were actually committed or not. {@code 
--allowNonRestoredState} can lead to data loss if the
+     * Iceberg commit failed in the last completed checkpoints.
+     *
+     * @param newPrefix defines the iceberg table's key.

Review comment:
       > @param newPrefix defines the iceberg table's key.
   
   I think we will need a correct parameter doc for this.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to