liubo1022126 commented on a change in pull request #3130:
URL: https://github.com/apache/iceberg/pull/3130#discussion_r725449357



##########
File path: core/src/main/java/org/apache/iceberg/TableProperties.java
##########
@@ -228,4 +228,10 @@ private TableProperties() {
 
   public static final String MERGE_CARDINALITY_CHECK_ENABLED = 
"write.merge.cardinality-check.enabled";
   public static final boolean MERGE_CARDINALITY_CHECK_ENABLED_DEFAULT = true;
+
+  public static final String WATERMARK_FIELD_NAME = "write.watermark.field";
+  public static final String WATERMARK_FIELD_NAME_DEFAULT = "";
+
+  public static final String WATERMARK_VALUE = "write.watermark";

Review comment:
       sorry, I just came back from holiday.
   
   I think it’s a good idea to use different suffixes to distinguish different 
streaming jobs when writing watermark. Because job processing will not be 
blocked by other job problems. For example: 
   
![image](https://user-images.githubusercontent.com/47106533/136648890-77ebd1c4-e999-44aa-955e-7a85eadc9bcc.png)
   it's here, If we treat all physical jobs as a logical job, then each 
physical job is a parallelism in the logical job. So anyway, if we need to 
obtain information about different physical jobs, it is necessary to write 
different watermarks into table properties.
   
   I think what we can do a little more is to automatically aggregate different 
watermark values during the streaming job commit to get the final result. For 
example: 
   
![image](https://user-images.githubusercontent.com/47106533/136648925-0bb5d62b-f08e-4104-a462-517594af6877.png)
   That is to say, when using watermark value downstream, only need to use 
`write.watermark` directly, don't care about upstream logic.
   
   **In code design, we may need to create a new class for the process of 
watermark, not simply write the watermark processing into IcebergFilesCommitter 
class like this pr. Divided into a simple watermark processing and  multiple 
regions.**




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to