Thanks Yong for driving this PIP! >> Currently, there will be a global committer node in Flink Paimon job which >> is used to ensure the consistency of written data in Paimon. This committer >> node will connect all the tasks in Flink job together, and all the tasks are >> within one region. As a result, when any task fails, it will trigger a >> global failover of the Flink job. We use HDFS as the remote storage, and we >> often encounter situations where the global failover of jobs is triggered >> due to write timeouts or errors when writing to HDFS, which are quite a few >> stability issues.
I know that Flink SinkV2 is also committed through a regular node, does this mean that SinkV2 also has this drawback? Best, Jingsong On Mon, Feb 10, 2025 at 4:06 PM Yunfeng Zhou <flink.zhouyunf...@gmail.com> wrote: > > Hi Yong, > > The general idea looks good to me. Is there any statistics on the number of > operator events that need to be transmitted between the coordinator and the > writer operators? This information could help provide estimations on the > additional workload to the JM, preventing the JM from being a single > bottleneck to the throughput of Paimon sinks. > > Best, > Yunfeng > > > 2025年1月23日 17:44,Yong Fang <zjur...@gmail.com> 写道: > > > > Hi devs, > > > > I would like to start a discussion about PIP-30: Improvement For Paimon > > Committer In Flink [1]. > > > > Currently Flink writes data to Paimon based on Two-Phase Commit which will > > generate a global committer node and connect all tasks in one region. If > > any task fails, it will lead to a global failover in Flink job. > > > > To solve this issue, we would like to introduce a Paimon Writer Coordinator > > to perform table commit operation, enabling Flink paimon jobs to support > > region failover and improving stability. > > > > Looking forward to hearing from you, thanks! > > > > [1] > > https://cwiki.apache.org/confluence/display/PAIMON/PIP-30%3A+Improvement+For+Paimon+Committer+In+Flink > > > > > > Best, > > Fang Yong >