Re: [DISCUSS] PIP-30: Improvement For Paimon Committer In Flink

Yanquan Lv Tue, 11 Feb 2025 17:35:49 -0800

Hi Fang Yong. Thanks for driving it.
Adding an alternative way to complete the global commit is good for me.


I still have a confusion similar with xiangyu feng, 
> This committer node will connect all the tasks in Flink job together, and all 
> the tasks are within one region. As a result, when any task fails, it will 
> trigger a global failover of the Flink job.
Why would the failure of a subtask have impact on other subtasks? The 
explanation on how to form a region in the Flink document is not particularly 
detailed. Can you explain more on it.


> 2025年2月11日 16:55，Yong Fang <zjur...@gmail.com> 写道：
> 
> Thanks for all the feedback.
> 
> To @xiangyu feng:
> 1. Regarding region failover, I'm referring to the `Restart Pipelined
> Region Failover Strategy` [1] in Flink.
> 2. This feature mainly enables support for region failover, enhancing
> stability. Of course, in practical use, users can configure the JM to
> provide more suitable resources for the committer node.
> 
> To @Xintong Song:
> It doesn't matter if the notifyComplete of one checkpoint fails to be
> called. Since the data will be stored in the HDFS path of the coordinator,
> when the notifyComplete of the next CP is called, the data files generated
> by the two CPs will be merged to create a Paimon snapshot.
> 
> To @Yunfeng Zhou:
> The pressure on the JM (JobManager) depends on two aspects:
> Message communication pressure: It is determined by the checkpoint (CP)
> interval and the number of sinks. Each sink generates one message in each
> CP. If there are a total of 5000 sinks, the JM will receive 5000 messages
> in one CP, which may take about a few seconds.
> Pressure from executing table commits: It is necessary to create a Paimon
> snapshot, and compaction may even be triggered. Therefore, asynchronous
> threads should be considered for these operations to avoid blocking the
> main thread of the JM.
> 
> To @Jingsong and @wj wang
> When using Flink SinkV2 to write to Paimon, a global committer node will
> also be generated in the physical execution plan to create and manage
> Paimon snapshots. Therefore, the same problem arises
> 
> [1]
> https://nightlies.apache.org/flink/flink-docs-master/docs/ops/state/task_failure_recovery/#restart-pipelined-region-failover-strategy
> 
> Best,
> Fang Yong
> 
> On Tue, Feb 11, 2025 at 11:26 AM wj wang <hongli....@gmail.com> wrote:
> 
>> Hi Yong, thanks for driving this PIP.
>> I have a small question:
>> Why not use Flink SinkV2 instead of moving the committer logic into
>> JM's OperatorCoordinator?
>> 
>> On Tue, Feb 11, 2025 at 10:34 AM Jingsong Li <jingsongl...@gmail.com>
>> wrote:
>>> 
>>> Thanks Yong for driving this PIP!
>>> 
>>>>> Currently, there will be a global committer node in Flink Paimon job
>> which is used to ensure the consistency of written data in Paimon. This
>> committer node will connect all the tasks in Flink job together, and all
>> the tasks are within one region. As a result, when any task fails, it will
>> trigger a global failover of the Flink job. We use HDFS as the remote
>> storage, and we often encounter situations where the global failover of
>> jobs is triggered due to write timeouts or errors when writing to HDFS,
>> which are quite a few stability issues.
>>> 
>>> I know that Flink SinkV2 is also committed through a regular node,
>>> does this mean that SinkV2 also has this drawback?
>>> 
>>> Best,
>>> Jingsong
>>> 
>>> On Mon, Feb 10, 2025 at 4:06 PM Yunfeng Zhou
>>> <flink.zhouyunf...@gmail.com> wrote:
>>>> 
>>>> Hi Yong,
>>>> 
>>>> The general idea looks good to me. Is there any statistics on the
>> number of operator events that need to be transmitted between the
>> coordinator and the writer operators? This information could help provide
>> estimations on the additional workload to the JM, preventing the JM from
>> being a single bottleneck to the throughput of Paimon sinks.
>>>> 
>>>> Best,
>>>> Yunfeng
>>>> 
>>>>> 2025年1月23日 17:44，Yong Fang <zjur...@gmail.com> 写道：
>>>>> 
>>>>> Hi devs,
>>>>> 
>>>>> I would like to start a discussion about PIP-30: Improvement For
>> Paimon
>>>>> Committer In Flink [1].
>>>>> 
>>>>> Currently Flink writes data to Paimon based on Two-Phase Commit
>> which will
>>>>> generate a global committer node and connect all tasks in one
>> region. If
>>>>> any task fails, it will lead to a global failover in Flink job.
>>>>> 
>>>>> To solve this issue, we would like to introduce a Paimon Writer
>> Coordinator
>>>>> to perform table commit operation, enabling Flink paimon jobs to
>> support
>>>>> region failover and improving stability.
>>>>> 
>>>>> Looking forward to hearing from you, thanks!
>>>>> 
>>>>> [1]
>>>>> 
>> https://cwiki.apache.org/confluence/display/PAIMON/PIP-30%3A+Improvement+For+Paimon+Committer+In+Flink
>>>>> 
>>>>> 
>>>>> Best,
>>>>> Fang Yong
>>>> 
>>

Re: [DISCUSS] PIP-30: Improvement For Paimon Committer In Flink

Reply via email to