牛一凡 created FLINK-40038:
---------------------------

             Summary: [mysql][pipeline] Incremental sync throughput is low in 
hotspot UPDATE workloads due to deserialization overhead
                 Key: FLINK-40038
                 URL: https://issues.apache.org/jira/browse/FLINK-40038
             Project: Flink
          Issue Type: Bug
          Components: Flink CDC
    Affects Versions: cdc-3.7.0
            Reporter: 牛一凡
         Attachments: image-2026-07-01-18-11-43-636.png

## Motivation

We observed low incremental sync throughput on a MySQL-to-Doris pipeline when 
using Flink CDC in a large-table hotspot UPDATE workload. In this scenario, 
upstream and downstream started to lag behind and the job showed an obvious 
backlog during incremental synchronization.

After collecting and analyzing the job flame graph, we found that a significant 
portion of the CPU time was spent in the MySQL pipeline deserialization path, 
especially around repeated schema/data type inference during row 
deserialization. This overhead becomes more noticeable when a table receives 
frequent UPDATE events.

A related performance concern was mentioned in 
[FLINK-35715|https://issues.apache.org/jira/browse/FLINK-35715], but in our 
workload this bottleneck still exists and is still impactful enough to cause 
chasing-lag behavior in production-like environments.

It would be great to further investigate and optimize this issue.

## Flame Graph

!image-2026-07-01-18-11-43-636.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to