[
https://issues.apache.org/jira/browse/FLINK-37120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated FLINK-37120:
-----------------------------------
Labels: pull-request-available (was: )
> MySqlSnapshotSplitAssigner assign the ending chunk early to avoid TaskManager
> OOM
> ---------------------------------------------------------------------------------
>
> Key: FLINK-37120
> URL: https://issues.apache.org/jira/browse/FLINK-37120
> Project: Flink
> Issue Type: Improvement
> Components: Flink CDC
> Affects Versions: cdc-3.2.1
> Reporter: JunboWang
> Priority: Minor
> Labels: pull-request-available
>
> When synchronizing a large table, the ending chunk is always executed last,
> and splitEnd is null. This causing the task to scan too much data and
> eventually TaskManager OOM.
>
> Related log:
> {code:java}
> // code placeholder
> 2025-01-13T20:52:01.926+0800: 136.022: [Full GC (Allocation Failure)
> 2025-01-13T20:52:01.926+0800: 136.022: [Tenured:
> 1713535K->1713535K(1713536K), 3.6111578 secs] 2484607K->2482139K(2484608K),
> [Metaspace: 89121K->89121K(1130496K)], 3.6113026 secs] [Times: user=3.20
> sys=0.40, real=3.61 secs]
> 2025-01-13T20:52:05.555+0800: 139.651: [Full GC (Allocation Failure)
> 2025-01-13T20:52:05.555+0800: 139.651: [Tenured:
> 1713535K->1713535K(1713536K), 3.9733441 secs] 2484607K->2482375K(2484608K),
> [Metaspace: 89133K->89133K(1130496K)], 3.9734511 secs] [Times: user=3.52
> sys=0.45, real=3.98 secs]
> 2025-01-13T20:52:09.548+0800: 143.644: [Full GC (Allocation Failure)
> 2025-01-13T20:52:09.548+0800: 143.644: [Tenured:
> 1713535K->1713535K(1713536K), 3.3805432 secs] 2484607K->2482897K(2484608K),
> [Metaspace: 89134K->89134K(1130496K)], 3.3806496 secs] [Times: user=3.36
> sys=0.02, real=3.38 secs] {code}
> {code:java}
> // code placeholder
> 2025-01-13 20:49:54,563 INFO
> org.apache.flink.cdc.connectors.mysql.debezium.task.context.StatefulTaskContext
> [] - Starting offset is initialized to {ts_sec=0, file=, pos=0,
> kind=EARLIEST, row=0, event=0}
> 2025-01-13 20:49:54,631 INFO
> org.apache.flink.cdc.connectors.mysql.debezium.task.MySqlSnapshotSplitReadTask
> [] - Snapshot step 1 - Determining low watermark {ts_sec=0,
> file=mysql-bin.xxxxx, pos=xxxxx, kind=SPECIFIC, gtids=xxxxxxxxx, row=0,
> event=0} for split MySqlSnapshotSplit{tableId=xxxxdb.xxxxx_test_table,
> splitId='xxxxdb.xxxxx_test_table:159959', splitKeyType=[`id` BIGINT NOT
> NULL], splitStart=[1333738235], splitEnd=null, highWatermark=null}
> 2025-01-13 20:49:54,636 INFO
> org.apache.flink.cdc.connectors.mysql.debezium.task.MySqlSnapshotSplitReadTask
> [] - Snapshot step 2 - Snapshotting data
> 2025-01-13 20:49:54,636 INFO
> org.apache.flink.cdc.connectors.mysql.debezium.task.MySqlSnapshotSplitReadTask
> [] - Exporting data from split 'xxxxdb.xxxxx_test_table:159959' of table
> xxxxdb.xxxxx_test_table
> 2025-01-13 20:49:54,637 INFO
> org.apache.flink.cdc.connectors.mysql.debezium.task.MySqlSnapshotSplitReadTask
> [] - For split 'xxxxdb.xxxxx_test_table:159959' of table
> xxxxdb.xxxxx_test_table using select statement: 'SELECT * FROM
> `xxxxdb`.`xxxxx_test_table` WHERE `id` >= ?'
> 2025-01-13 20:50:17,482 INFO
> org.apache.flink.cdc.connectors.mysql.debezium.task.MySqlSnapshotSplitReadTask
> [] - Exported 167463 records for split 'xxxxdb.xxxxx_test_table:159959'
> after 00:00:22.846
> 2025-01-13 20:50:31,627 INFO
> org.apache.flink.cdc.connectors.mysql.debezium.task.MySqlSnapshotSplitReadTask
> [] - Exported 409419 records for split 'xxxxdb.xxxxx_test_table:159959'
> after 00:00:36.991
> 2025-01-13 20:50:41,805 INFO
> org.apache.flink.cdc.connectors.mysql.debezium.task.MySqlSnapshotSplitReadTask
> [] - Exported 510663 records for split 'xxxxdb.xxxxx_test_table:159959'
> after 00:00:47.169
> 2025-01-13 20:50:55,184 INFO
> org.apache.flink.cdc.connectors.mysql.debezium.task.MySqlSnapshotSplitReadTask
> [] - Exported 588220 records for split 'xxxxdb.xxxxx_test_table:159959'
> after 00:01:00.548
> 2025-01-13 20:51:05,580 INFO
> org.apache.flink.cdc.connectors.mysql.debezium.task.MySqlSnapshotSplitReadTask
> [] - Exported 615374 records for split 'xxxxdb.xxxxx_test_table:159959'
> after 00:01:10.944 {code}
>
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)