牛一凡 created FLINK-40038:
---------------------------
Summary: [mysql][pipeline] Incremental sync throughput is low in
hotspot UPDATE workloads due to deserialization overhead
Key: FLINK-40038
URL: https://issues.apache.org/jira/browse/FLINK-40038
Project: Flink
Issue Type: Bug
Components: Flink CDC
Affects Versions: cdc-3.7.0
Reporter: 牛一凡
Attachments: image-2026-07-01-18-11-43-636.png
## Motivation
We observed low incremental sync throughput on a MySQL-to-Doris pipeline when
using Flink CDC in a large-table hotspot UPDATE workload. In this scenario,
upstream and downstream started to lag behind and the job showed an obvious
backlog during incremental synchronization.
After collecting and analyzing the job flame graph, we found that a significant
portion of the CPU time was spent in the MySQL pipeline deserialization path,
especially around repeated schema/data type inference during row
deserialization. This overhead becomes more noticeable when a table receives
frequent UPDATE events.
A related performance concern was mentioned in
[FLINK-35715|https://issues.apache.org/jira/browse/FLINK-35715], but in our
workload this bottleneck still exists and is still impactful enough to cause
chasing-lag behavior in production-like environments.
It would be great to further investigate and optimize this issue.
## Flame Graph
!image-2026-07-01-18-11-43-636.png!
--
This message was sent by Atlassian Jira
(v8.20.10#820010)