To @Yanquan Lv The concept of 'Region' is in Blink or old Flink version. Now the 'Region' is means `Restart Pipelined Region Failover Strategy` in Flink.
On Wed, Feb 12, 2025 at 9:35 AM Yanquan Lv <decq12y...@gmail.com> wrote: > > Hi Fang Yong. Thanks for driving it. > Adding an alternative way to complete the global commit is good for me. > > I still have a confusion similar with xiangyu feng, > > This committer node will connect all the tasks in Flink job together, and > > all the tasks are within one region. As a result, when any task fails, it > > will trigger a global failover of the Flink job. > Why would the failure of a subtask have impact on other subtasks? The > explanation on how to form a region in the Flink document is not particularly > detailed. Can you explain more on it. > > > > 2025年2月11日 16:55,Yong Fang <zjur...@gmail.com> 写道: > > > > Thanks for all the feedback. > > > > To @xiangyu feng: > > 1. Regarding region failover, I'm referring to the `Restart Pipelined > > Region Failover Strategy` [1] in Flink. > > 2. This feature mainly enables support for region failover, enhancing > > stability. Of course, in practical use, users can configure the JM to > > provide more suitable resources for the committer node. > > > > To @Xintong Song: > > It doesn't matter if the notifyComplete of one checkpoint fails to be > > called. Since the data will be stored in the HDFS path of the coordinator, > > when the notifyComplete of the next CP is called, the data files generated > > by the two CPs will be merged to create a Paimon snapshot. > > > > To @Yunfeng Zhou: > > The pressure on the JM (JobManager) depends on two aspects: > > Message communication pressure: It is determined by the checkpoint (CP) > > interval and the number of sinks. Each sink generates one message in each > > CP. If there are a total of 5000 sinks, the JM will receive 5000 messages > > in one CP, which may take about a few seconds. > > Pressure from executing table commits: It is necessary to create a Paimon > > snapshot, and compaction may even be triggered. Therefore, asynchronous > > threads should be considered for these operations to avoid blocking the > > main thread of the JM. > > > > To @Jingsong and @wj wang > > When using Flink SinkV2 to write to Paimon, a global committer node will > > also be generated in the physical execution plan to create and manage > > Paimon snapshots. Therefore, the same problem arises > > > > [1] > > https://nightlies.apache.org/flink/flink-docs-master/docs/ops/state/task_failure_recovery/#restart-pipelined-region-failover-strategy > > > > Best, > > Fang Yong > > > > On Tue, Feb 11, 2025 at 11:26 AM wj wang <hongli....@gmail.com> wrote: > > > >> Hi Yong, thanks for driving this PIP. > >> I have a small question: > >> Why not use Flink SinkV2 instead of moving the committer logic into > >> JM's OperatorCoordinator? > >> > >> On Tue, Feb 11, 2025 at 10:34 AM Jingsong Li <jingsongl...@gmail.com> > >> wrote: > >>> > >>> Thanks Yong for driving this PIP! > >>> > >>>>> Currently, there will be a global committer node in Flink Paimon job > >> which is used to ensure the consistency of written data in Paimon. This > >> committer node will connect all the tasks in Flink job together, and all > >> the tasks are within one region. As a result, when any task fails, it will > >> trigger a global failover of the Flink job. We use HDFS as the remote > >> storage, and we often encounter situations where the global failover of > >> jobs is triggered due to write timeouts or errors when writing to HDFS, > >> which are quite a few stability issues. > >>> > >>> I know that Flink SinkV2 is also committed through a regular node, > >>> does this mean that SinkV2 also has this drawback? > >>> > >>> Best, > >>> Jingsong > >>> > >>> On Mon, Feb 10, 2025 at 4:06 PM Yunfeng Zhou > >>> <flink.zhouyunf...@gmail.com> wrote: > >>>> > >>>> Hi Yong, > >>>> > >>>> The general idea looks good to me. Is there any statistics on the > >> number of operator events that need to be transmitted between the > >> coordinator and the writer operators? This information could help provide > >> estimations on the additional workload to the JM, preventing the JM from > >> being a single bottleneck to the throughput of Paimon sinks. > >>>> > >>>> Best, > >>>> Yunfeng > >>>> > >>>>> 2025年1月23日 17:44,Yong Fang <zjur...@gmail.com> 写道: > >>>>> > >>>>> Hi devs, > >>>>> > >>>>> I would like to start a discussion about PIP-30: Improvement For > >> Paimon > >>>>> Committer In Flink [1]. > >>>>> > >>>>> Currently Flink writes data to Paimon based on Two-Phase Commit > >> which will > >>>>> generate a global committer node and connect all tasks in one > >> region. If > >>>>> any task fails, it will lead to a global failover in Flink job. > >>>>> > >>>>> To solve this issue, we would like to introduce a Paimon Writer > >> Coordinator > >>>>> to perform table commit operation, enabling Flink paimon jobs to > >> support > >>>>> region failover and improving stability. > >>>>> > >>>>> Looking forward to hearing from you, thanks! > >>>>> > >>>>> [1] > >>>>> > >> https://cwiki.apache.org/confluence/display/PAIMON/PIP-30%3A+Improvement+For+Paimon+Committer+In+Flink > >>>>> > >>>>> > >>>>> Best, > >>>>> Fang Yong > >>>> > >> >