Re: [DISCUSS] PIP-30: Improvement For Paimon Committer In Flink

Yong Fang Tue, 11 Feb 2025 00:56:11 -0800

Thanks for all the feedback.

To @xiangyu feng:
1. Regarding region failover, I'm referring to the `Restart Pipelined
Region Failover Strategy` [1] in Flink.
2. This feature mainly enables support for region failover, enhancing
stability. Of course, in practical use, users can configure the JM to
provide more suitable resources for the committer node.

To @Xintong Song:
It doesn't matter if the notifyComplete of one checkpoint fails to be
called. Since the data will be stored in the HDFS path of the coordinator,
when the notifyComplete of the next CP is called, the data files generated
by the two CPs will be merged to create a Paimon snapshot.

To @Yunfeng Zhou:
The pressure on the JM (JobManager) depends on two aspects:
Message communication pressure: It is determined by the checkpoint (CP)
interval and the number of sinks. Each sink generates one message in each
CP. If there are a total of 5000 sinks, the JM will receive 5000 messages
in one CP, which may take about a few seconds.
Pressure from executing table commits: It is necessary to create a Paimon
snapshot, and compaction may even be triggered. Therefore, asynchronous
threads should be considered for these operations to avoid blocking the
main thread of the JM.

To @Jingsong and @wj wang
When using Flink SinkV2 to write to Paimon, a global committer node will
also be generated in the physical execution plan to create and manage
Paimon snapshots. Therefore, the same problem arises

[1]
https://nightlies.apache.org/flink/flink-docs-master/docs/ops/state/task_failure_recovery/#restart-pipelined-region-failover-strategy

Best,
Fang Yong

On Tue, Feb 11, 2025 at 11:26 AM wj wang <hongli....@gmail.com> wrote:

> Hi Yong, thanks for driving this PIP.
> I have a small question:
> Why not use Flink SinkV2 instead of moving the committer logic into
> JM's OperatorCoordinator?
>
> On Tue, Feb 11, 2025 at 10:34 AM Jingsong Li <jingsongl...@gmail.com>
> wrote:
> >
> > Thanks Yong for driving this PIP!
> >
> > >> Currently, there will be a global committer node in Flink Paimon job
> which is used to ensure the consistency of written data in Paimon. This
> committer node will connect all the tasks in Flink job together, and all
> the tasks are within one region. As a result, when any task fails, it will
> trigger a global failover of the Flink job. We use HDFS as the remote
> storage, and we often encounter situations where the global failover of
> jobs is triggered due to write timeouts or errors when writing to HDFS,
> which are quite a few stability issues.
> >
> > I know that Flink SinkV2 is also committed through a regular node,
> > does this mean that SinkV2 also has this drawback?
> >
> > Best,
> > Jingsong
> >
> > On Mon, Feb 10, 2025 at 4:06 PM Yunfeng Zhou
> > <flink.zhouyunf...@gmail.com> wrote:
> > >
> > > Hi Yong,
> > >
> > > The general idea looks good to me. Is there any statistics on the
> number of operator events that need to be transmitted between the
> coordinator and the writer operators? This information could help provide
> estimations on the additional workload to the JM, preventing the JM from
> being a single bottleneck to the throughput of Paimon sinks.
> > >
> > > Best,
> > > Yunfeng
> > >
> > > > 2025年1月23日 17:44，Yong Fang <zjur...@gmail.com> 写道：
> > > >
> > > > Hi devs,
> > > >
> > > > I would like to start a discussion about PIP-30: Improvement For
> Paimon
> > > > Committer In Flink [1].
> > > >
> > > > Currently Flink writes data to Paimon based on Two-Phase Commit
> which will
> > > > generate a global committer node and connect all tasks in one
> region. If
> > > > any task fails, it will lead to a global failover in Flink job.
> > > >
> > > > To solve this issue, we would like to introduce a Paimon Writer
> Coordinator
> > > > to perform table commit operation, enabling Flink paimon jobs to
> support
> > > > region failover and improving stability.
> > > >
> > > > Looking forward to hearing from you, thanks!
> > > >
> > > > [1]
> > > >
> https://cwiki.apache.org/confluence/display/PAIMON/PIP-30%3A+Improvement+For+Paimon+Committer+In+Flink
> > > >
> > > >
> > > > Best,
> > > > Fang Yong
> > >
>

Re: [DISCUSS] PIP-30: Improvement For Paimon Committer In Flink

Reply via email to