xxntti3n opened a new issue, #616:
URL: https://github.com/apache/doris-flink-connector/issues/616

   ### Search before asking
   
   - [x] I had searched in the 
[issues](https://github.com/apache/incubator-doris/issues?q=is%3Aissue) and 
found no similar issues.
   
   
   ### Version
   
   # Flink CDC Snapshot Processing & Restart Issue
   
   ## Scenario
   - Flink CDC job performs a **snapshot phase** to incrementally read chunks 
of a database table.
   - Doris backend is available and receiving data normally.
   - After a job restart, only 2 to 3 snapshot splits are processed.
   - Then the job stops receiving further snapshot splits and does not resume 
processing the latest incremental chunks.
   
   ---
   ## Job Log
   `2025-09-23` 08:17:55,482 INFO  
org.apache.flink.runtime.resourcemanager.slotmanager.FineGrainedSlotManager [] 
- Received resource requirements from job 2e93b210fd6dd1dbdb4ce167609dbe16: 
[ResourceRequirement{resourceProfile=ResourceProfile{UNKNOWN}, 
numberOfRequiredSlots=1}]
   2025-09-23 08:17:55,483 INFO  
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Source: MySQL 
Source -> Sink: Writer -> Sink: Committer (1/1) 
(99123b27dd58d3500b3e703b1da5bfc7_cbc357ccb763df2852fee8c4fc7d55f2_0_18) 
switched from SCHEDULED to DEPLOYING.
   2025-09-23 08:17:55,483 INFO  
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Deploying 
Source: MySQL Source -> Sink: Writer -> Sink: Committer (1/1) (attempt #18) 
with attempt id 
99123b27dd58d3500b3e703b1da5bfc7_cbc357ccb763df2852fee8c4fc7d55f2_0_18 and 
vertex id cbc357ccb763df2852fee8c4fc7d55f2_0 to database-name-7-taskmanager-1-1 
@ 10.81.16.5 (dataPort=34259) with allocation id 
39dedae0c787f2ab59a05ba00b92ca9c
   2025-09-23 08:17:55,497 INFO  
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Source: MySQL 
Source -> Sink: Writer -> Sink: Committer (1/1) 
(99123b27dd58d3500b3e703b1da5bfc7_cbc357ccb763df2852fee8c4fc7d55f2_0_18) 
switched from DEPLOYING to INITIALIZING.
   2025-09-23 08:17:55,563 INFO  
org.apache.flink.runtime.source.coordinator.SourceCoordinator [] - Source 
Source: MySQL Source registering reader for parallel task 0 
   2025-09-23 08:17:55,583 INFO  
org.apache.flink.runtime.source.coordinator.SourceCoordinator [] - Source 
Source: MySQL Source received split request from parallel task 0 
   2025-09-23 08:17:55,584 INFO  
org.apache.flink.cdc.connectors.mysql.source.enumerator.MySqlSourceEnumerator 
[] - The enumerator starts to assign split to subtask 0
   2025-09-23 08:17:55,585 INFO  
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Source: MySQL 
Source -> Sink: Writer -> Sink: Committer (1/1) 
(99123b27dd58d3500b3e703b1da5bfc7_cbc357ccb763df2852fee8c4fc7d55f2_0_18) 
switched from INITIALIZING to RUNNING.
   2025-09-23 08:17:55,585 INFO  
org.apache.flink.cdc.connectors.mysql.source.enumerator.MySqlSourceEnumerator 
[] - The enumerator assigns split 
MySqlSnapshotSplit{tableId=database_name.table_name, 
splitId='database_name.table_name:199', splitKeyType=[`id` BIGINT NOT NULL], 
splitStart=[1941445], splitEnd=[1951201], highWatermark=null} to subtask 0
   2025-09-23 08:18:17,794 INFO  
org.apache.flink.cdc.connectors.mysql.source.enumerator.MySqlSourceEnumerator 
[] - The enumerator under INITIAL_ASSIGNING receives finished split offsets 
FinishedSnapshotSplitsReportEvent{finishedOffsets={database_name.table_name:165={ts_sec=0,
 file=mysql-bin.008601, pos=10726803, kind=SPECIFIC, 
gtids=08360076-238b-11ed-a6d6-42010a6aa024:1-185507642,
   7c68f178-0aea-11f0-9529-42010a6aa0d4:1-197266132,
   ba3cd672-fc20-11eb-af4e-42010a6aa002:1-122293766, row=0, event=0}}} from 
subtask 0.
   2025-09-23 08:18:17,794 INFO  
org.apache.flink.runtime.source.coordinator.SourceCoordinator [] - Source 
Source: MySQL Source received split request from parallel task 0 (#18)
   2025-09-23 08:18:17,794 INFO  
org.apache.flink.cdc.connectors.mysql.source.enumerator.MySqlSourceEnumerator 
[] - The enumerator starts to assign split to subtask 0
   2025-09-23 08:18:17,795 INFO  
org.apache.flink.cdc.connectors.mysql.source.enumerator.MySqlSourceEnumerator 
[] - The enumerator assigns split 
MySqlSnapshotSplit{tableId=database_name.table_name, 
splitId='database_name.table_name:200', splitKeyType=[`id` BIGINT NOT NULL], 
splitStart=[1951201], splitEnd=[1960957], highWatermark=null} to subtask 0
   2025-09-23 08:22:27,845 INFO  
org.apache.flink.runtime.checkpoint.CheckpointCoordinator    [] - Triggering 
checkpoint 4 (type=CheckpointType{name='Checkpoint', 
sharingFilesStrategy=FORWARD_BACKWARD}) @ 1758615747799 for job 
2e93b210fd6dd1dbdb4ce167609dbe16.
   2025-09-23 08:22:28,735 INFO  
org.apache.flink.runtime.checkpoint.CheckpointCoordinator    [] - Completed 
checkpoint 4 for job 2e93b210fd6dd1dbdb4ce167609dbe16 (569104 bytes, 
checkpointDuration=649 ms, finalizationTime=287 ms).
   2025-09-23 08:22:28,735 INFO  
org.apache.flink.runtime.source.coordinator.SourceCoordinator [] - Marking 
checkpoint 4 as completed for source Source: MySQL Source.`
   
   ## Information
   # Version
    Flink-CDC 3.4.0
   
   
   ### What's Wrong?
   
   doris-flink-connector 25.1.0
   
   ### What You Expected?
   
   # Flink CDC Snapshot Processing & Restart Issue
   
   ## Scenario
   - Flink CDC job performs a **snapshot phase** to incrementally read chunks 
of a database table.
   - Doris backend is available and receiving data normally.
   - After a job restart, only 2 to 3 snapshot splits are processed.
   - Then the job stops receiving further snapshot splits and does not resume 
processing the latest incremental chunks.
   
   ---
   ## Job Log
   `2025-09-23` 08:17:55,482 INFO  
org.apache.flink.runtime.resourcemanager.slotmanager.FineGrainedSlotManager [] 
- Received resource requirements from job 2e93b210fd6dd1dbdb4ce167609dbe16: 
[ResourceRequirement{resourceProfile=ResourceProfile{UNKNOWN}, 
numberOfRequiredSlots=1}]
   2025-09-23 08:17:55,483 INFO  
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Source: MySQL 
Source -> Sink: Writer -> Sink: Committer (1/1) 
(99123b27dd58d3500b3e703b1da5bfc7_cbc357ccb763df2852fee8c4fc7d55f2_0_18) 
switched from SCHEDULED to DEPLOYING.
   2025-09-23 08:17:55,483 INFO  
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Deploying 
Source: MySQL Source -> Sink: Writer -> Sink: Committer (1/1) (attempt #18) 
with attempt id 
99123b27dd58d3500b3e703b1da5bfc7_cbc357ccb763df2852fee8c4fc7d55f2_0_18 and 
vertex id cbc357ccb763df2852fee8c4fc7d55f2_0 to database-name-7-taskmanager-1-1 
@ 10.81.16.5 (dataPort=34259) with allocation id 
39dedae0c787f2ab59a05ba00b92ca9c
   2025-09-23 08:17:55,497 INFO  
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Source: MySQL 
Source -> Sink: Writer -> Sink: Committer (1/1) 
(99123b27dd58d3500b3e703b1da5bfc7_cbc357ccb763df2852fee8c4fc7d55f2_0_18) 
switched from DEPLOYING to INITIALIZING.
   2025-09-23 08:17:55,563 INFO  
org.apache.flink.runtime.source.coordinator.SourceCoordinator [] - Source 
Source: MySQL Source registering reader for parallel task 0 
   2025-09-23 08:17:55,583 INFO  
org.apache.flink.runtime.source.coordinator.SourceCoordinator [] - Source 
Source: MySQL Source received split request from parallel task 0 
   2025-09-23 08:17:55,584 INFO  
org.apache.flink.cdc.connectors.mysql.source.enumerator.MySqlSourceEnumerator 
[] - The enumerator starts to assign split to subtask 0
   2025-09-23 08:17:55,585 INFO  
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Source: MySQL 
Source -> Sink: Writer -> Sink: Committer (1/1) 
(99123b27dd58d3500b3e703b1da5bfc7_cbc357ccb763df2852fee8c4fc7d55f2_0_18) 
switched from INITIALIZING to RUNNING.
   2025-09-23 08:17:55,585 INFO  
org.apache.flink.cdc.connectors.mysql.source.enumerator.MySqlSourceEnumerator 
[] - The enumerator assigns split 
MySqlSnapshotSplit{tableId=database_name.table_name, 
splitId='database_name.table_name:199', splitKeyType=[`id` BIGINT NOT NULL], 
splitStart=[1941445], splitEnd=[1951201], highWatermark=null} to subtask 0
   2025-09-23 08:18:17,794 INFO  
org.apache.flink.cdc.connectors.mysql.source.enumerator.MySqlSourceEnumerator 
[] - The enumerator under INITIAL_ASSIGNING receives finished split offsets 
FinishedSnapshotSplitsReportEvent{finishedOffsets={database_name.table_name:165={ts_sec=0,
 file=mysql-bin.008601, pos=10726803, kind=SPECIFIC, 
gtids=08360076-238b-11ed-a6d6-42010a6aa024:1-185507642,
   7c68f178-0aea-11f0-9529-42010a6aa0d4:1-197266132,
   ba3cd672-fc20-11eb-af4e-42010a6aa002:1-122293766, row=0, event=0}}} from 
subtask 0.
   2025-09-23 08:18:17,794 INFO  
org.apache.flink.runtime.source.coordinator.SourceCoordinator [] - Source 
Source: MySQL Source received split request from parallel task 0 (#18)
   2025-09-23 08:18:17,794 INFO  
org.apache.flink.cdc.connectors.mysql.source.enumerator.MySqlSourceEnumerator 
[] - The enumerator starts to assign split to subtask 0
   2025-09-23 08:18:17,795 INFO  
org.apache.flink.cdc.connectors.mysql.source.enumerator.MySqlSourceEnumerator 
[] - The enumerator assigns split 
MySqlSnapshotSplit{tableId=database_name.table_name, 
splitId='database_name.table_name:200', splitKeyType=[`id` BIGINT NOT NULL], 
splitStart=[1951201], splitEnd=[1960957], highWatermark=null} to subtask 0
   2025-09-23 08:22:27,845 INFO  
org.apache.flink.runtime.checkpoint.CheckpointCoordinator    [] - Triggering 
checkpoint 4 (type=CheckpointType{name='Checkpoint', 
sharingFilesStrategy=FORWARD_BACKWARD}) @ 1758615747799 for job 
2e93b210fd6dd1dbdb4ce167609dbe16.
   2025-09-23 08:22:28,735 INFO  
org.apache.flink.runtime.checkpoint.CheckpointCoordinator    [] - Completed 
checkpoint 4 for job 2e93b210fd6dd1dbdb4ce167609dbe16 (569104 bytes, 
checkpointDuration=649 ms, finalizationTime=287 ms).
   2025-09-23 08:22:28,735 INFO  
org.apache.flink.runtime.source.coordinator.SourceCoordinator [] - Marking 
checkpoint 4 as completed for source Source: MySQL Source.`
   
   
   
   ### How to Reproduce?
   
   ## How to Reproduce the Bug
   
   1. Start a Flink CDC job that is currently in the snapshot phase, connected 
to a Doris backend which is up and running.
   2. Terminate the Doris backend during the snapshot processing.
   3. Restart the Doris backend to make it available again.
   4. Observe that the Flink CDC job processes 2 to 3 snapshot splits 
successfully after Doris restarts.
   5. Repeat steps 2 and 3 (terminate and restart Doris) about 4 to 5 times.
   6. After these repeated Doris restarts, the Flink CDC job will stop fetching 
additional snapshot splits and will not resume processing the latest 
incremental chunks.
   
   ### Anything Else?
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to