stevenzwu commented on a change in pull request #3130:
URL: https://github.com/apache/iceberg/pull/3130#discussion_r728624766
##########
File path: core/src/main/java/org/apache/iceberg/TableProperties.java
##########
@@ -228,4 +228,10 @@ private TableProperties() {
public static final String MERGE_CARDINALITY_CHECK_ENABLED =
"write.merge.cardinality-check.enabled";
public static final boolean MERGE_CARDINALITY_CHECK_ENABLED_DEFAULT = true;
+
+ public static final String WATERMARK_FIELD_NAME = "write.watermark.field";
+ public static final String WATERMARK_FIELD_NAME_DEFAULT = "";
+
+ public static final String WATERMARK_VALUE = "write.watermark";
Review comment:
> t's here, If we treat all physical jobs as a logical job, then each
physical job is a parallelism in the logical job. So anyway, if we need to
obtain information about different physical jobs, it is necessary to write
different watermarks into table properties. I think what we can do a little
more is to automatically aggregate different watermark values during the
streaming job commit to get the final result.
This is an interesting idea definitely with merits. Conceptually, it makes
sense to me.
If the Flink sink also update the aggregated `write.watermark`, how does
commit collision retry work? We can't move the watermark aggregation logic into
core Iceberg commit retry logic. It seems that we have to implement commit
retry logic in the Flink sink now due to this special handling on the
aggregated watermark table property.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]