Hi guys: We have drafted a proposal on the notification mechanism after Flink partition commit and would like to hear your suggestions. Could someone help to review if you are interested? Thank you for your assistance.
Flink Support Partition Commit notification for append data After writing a partition, it is often necessary to notify downstream applications. [image: image.png] For example, there is a table partitioned by hour, when the watermark based on event time reaches 12:00:00 we think that the data of the 11 hour partition is completed and can be used by downstream applications, so we commit this partition to the metadata (create a new snapshot). Then, make a remote API call to notify the task flow that it can start the spark batch step for 11 hour partition. For the current default commit mode, which depends on Checkpoint, we can understand it as using process time to determine table partition commit. And the partition commit strategy in this PR, we can understand it as using event time to decide whether to commit table partitions. - Policy: What need to do when committing the partition, built-in policies support for the commit of `default`(only commit to metadata) and `success files`(for file like system, e.g. HDFS) and you can also implement your own policies, standardized interfaces are provided so that developers can easily carry out custom extension, such as merging small files, remote api calls, etc. Compatibility This PR maintains compatibility with the Flink ecosystem. https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/hive/hive_read_write/#writing https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/connectors/table/filesystem/#partition-commit NOTE: Partition Commit only works in dynamic partition inserting. Q:What happens when 10 partition's data arrives again after the 10 partition has been committed? S:The message that both the 10 and 11 partitions are committed is sent to the receiver when 11 partition is about to be committed, and the receiver decides what action to take. The above is a partial introduction of the proposal. For the full document, please refer to: https://docs.google.com/document/d/1Sobv8XbvsyPzHi1YWy_jSet1Wy7smXKDKeQrNZSFYCg Thank you very much for the valuable suggestions from @Steven and @Junjie Chen. Thanks, Liwei Li