[ 
https://issues.apache.org/jira/browse/FLINK-32348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17742108#comment-17742108
 ] 

Jiabao Sun commented on FLINK-32348:
------------------------------------

[~martijnvisser] Sorry for the late reply.

The root cause of this error is not removing it from readersAwaitingSplit when 
closing an idle reader.
This resulted in splits being incorrectly assigned to readers that did not 
complete when resuming tasks from checkpoints.

1. readersAwaitingSplit: [0]
2. signalNoMoreSplits but not remove 0 from readersAwaitingSplit
3. TaskManager failover
4. split request from reader 1 -> readersAwaitingSplit: [0, 1]
5. but actually assigns split to reader 0.

The PR is ready, could you help review it?

> MongoDB tests are flaky and time out
> ------------------------------------
>
>                 Key: FLINK-32348
>                 URL: https://issues.apache.org/jira/browse/FLINK-32348
>             Project: Flink
>          Issue Type: Bug
>          Components: Connectors / MongoDB
>            Reporter: Martijn Visser
>            Priority: Critical
>              Labels: pull-request-available, test-stability
>
> https://github.com/apache/flink-connector-mongodb/actions/runs/5232649632/jobs/9447519651#step:13:39307



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to