Hi guys:

We have drafted a proposal on the notification mechanism after Flink
partition commit and would like to hear your suggestions. Could someone
help to review if you are interested? Thank you for your assistance.


Flink Support Partition Commit notification for append data
After writing a partition, it is often necessary to notify downstream
applications. [image: image.png]


For example, there is a table partitioned by hour, when the watermark based
on event time reaches 12:00:00 we think that the data of the 11 hour
partition is completed and can be used by downstream applications, so we
commit this partition to the metadata (create a new snapshot). Then, make a
remote API call to notify the task flow that it can start the spark batch
step for 11 hour partition.

For the current default commit mode, which depends on Checkpoint, we can
understand it as using process time to determine table partition commit.
And the partition commit strategy in this PR, we can understand it as using
event time to decide whether to commit table partitions.

   -

   Policy: What need to do when committing the partition, built-in policies
   support for the commit of `default`(only commit to metadata) and `success
   files`(for file like system, e.g. HDFS) and you can also implement your
   own policies, standardized interfaces are provided so that developers
   can easily carry out custom extension, such as merging small files,
   remote api calls, etc.

Compatibility

This PR maintains compatibility with the Flink ecosystem.

https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/hive/hive_read_write/#writing

https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/connectors/table/filesystem/#partition-commit


NOTE: Partition Commit only works in dynamic partition inserting.


Q:What happens when 10 partition's data arrives again after the 10
partition has been committed?

S:The message that both the 10 and 11 partitions are committed is sent to
the receiver when 11 partition is about to be committed, and the receiver
decides what action to take.

The above is a partial introduction of the proposal. For the full document,
please refer to:
https://docs.google.com/document/d/1Sobv8XbvsyPzHi1YWy_jSet1Wy7smXKDKeQrNZSFYCg


Thank you very much for the valuable suggestions from @Steven and @Junjie
Chen.

Thanks,
Liwei Li

Reply via email to