Hi Fang Yong. Thanks for driving it. Adding an alternative way to complete the global commit is good for me.
I still have a confusion similar with xiangyu feng, > This committer node will connect all the tasks in Flink job together, and all > the tasks are within one region. As a result, when any task fails, it will > trigger a global failover of the Flink job. Why would the failure of a subtask have impact on other subtasks? The explanation on how to form a region in the Flink document is not particularly detailed. Can you explain more on it. > 2025年2月11日 16:55,Yong Fang <zjur...@gmail.com> 写道: > > Thanks for all the feedback. > > To @xiangyu feng: > 1. Regarding region failover, I'm referring to the `Restart Pipelined > Region Failover Strategy` [1] in Flink. > 2. This feature mainly enables support for region failover, enhancing > stability. Of course, in practical use, users can configure the JM to > provide more suitable resources for the committer node. > > To @Xintong Song: > It doesn't matter if the notifyComplete of one checkpoint fails to be > called. Since the data will be stored in the HDFS path of the coordinator, > when the notifyComplete of the next CP is called, the data files generated > by the two CPs will be merged to create a Paimon snapshot. > > To @Yunfeng Zhou: > The pressure on the JM (JobManager) depends on two aspects: > Message communication pressure: It is determined by the checkpoint (CP) > interval and the number of sinks. Each sink generates one message in each > CP. If there are a total of 5000 sinks, the JM will receive 5000 messages > in one CP, which may take about a few seconds. > Pressure from executing table commits: It is necessary to create a Paimon > snapshot, and compaction may even be triggered. Therefore, asynchronous > threads should be considered for these operations to avoid blocking the > main thread of the JM. > > To @Jingsong and @wj wang > When using Flink SinkV2 to write to Paimon, a global committer node will > also be generated in the physical execution plan to create and manage > Paimon snapshots. Therefore, the same problem arises > > [1] > https://nightlies.apache.org/flink/flink-docs-master/docs/ops/state/task_failure_recovery/#restart-pipelined-region-failover-strategy > > Best, > Fang Yong > > On Tue, Feb 11, 2025 at 11:26 AM wj wang <hongli....@gmail.com> wrote: > >> Hi Yong, thanks for driving this PIP. >> I have a small question: >> Why not use Flink SinkV2 instead of moving the committer logic into >> JM's OperatorCoordinator? >> >> On Tue, Feb 11, 2025 at 10:34 AM Jingsong Li <jingsongl...@gmail.com> >> wrote: >>> >>> Thanks Yong for driving this PIP! >>> >>>>> Currently, there will be a global committer node in Flink Paimon job >> which is used to ensure the consistency of written data in Paimon. This >> committer node will connect all the tasks in Flink job together, and all >> the tasks are within one region. As a result, when any task fails, it will >> trigger a global failover of the Flink job. We use HDFS as the remote >> storage, and we often encounter situations where the global failover of >> jobs is triggered due to write timeouts or errors when writing to HDFS, >> which are quite a few stability issues. >>> >>> I know that Flink SinkV2 is also committed through a regular node, >>> does this mean that SinkV2 also has this drawback? >>> >>> Best, >>> Jingsong >>> >>> On Mon, Feb 10, 2025 at 4:06 PM Yunfeng Zhou >>> <flink.zhouyunf...@gmail.com> wrote: >>>> >>>> Hi Yong, >>>> >>>> The general idea looks good to me. Is there any statistics on the >> number of operator events that need to be transmitted between the >> coordinator and the writer operators? This information could help provide >> estimations on the additional workload to the JM, preventing the JM from >> being a single bottleneck to the throughput of Paimon sinks. >>>> >>>> Best, >>>> Yunfeng >>>> >>>>> 2025年1月23日 17:44,Yong Fang <zjur...@gmail.com> 写道: >>>>> >>>>> Hi devs, >>>>> >>>>> I would like to start a discussion about PIP-30: Improvement For >> Paimon >>>>> Committer In Flink [1]. >>>>> >>>>> Currently Flink writes data to Paimon based on Two-Phase Commit >> which will >>>>> generate a global committer node and connect all tasks in one >> region. If >>>>> any task fails, it will lead to a global failover in Flink job. >>>>> >>>>> To solve this issue, we would like to introduce a Paimon Writer >> Coordinator >>>>> to perform table commit operation, enabling Flink paimon jobs to >> support >>>>> region failover and improving stability. >>>>> >>>>> Looking forward to hearing from you, thanks! >>>>> >>>>> [1] >>>>> >> https://cwiki.apache.org/confluence/display/PAIMON/PIP-30%3A+Improvement+For+Paimon+Committer+In+Flink >>>>> >>>>> >>>>> Best, >>>>> Fang Yong >>>> >>