Piotr Nowojski created FLINK-13698:
--------------------------------------

             Summary: Rework threading model of CheckpointCoordinator
                 Key: FLINK-13698
                 URL: https://issues.apache.org/jira/browse/FLINK-13698
             Project: Flink
          Issue Type: Bug
          Components: Runtime / Checkpointing
    Affects Versions: 1.10.0
            Reporter: Piotr Nowojski


Currently {{CheckpointCoordinator}} and {{CheckpointFailureManager}} code is 
executed by multiple different threads (mostly {{ioExecutor}}, but not only). 
It's causing multiple concurrency issues, for example: 
https://issues.apache.org/jira/browse/FLINK-13497

Proper fix would be to rethink threading model there. At first glance it 
doesn't seem that this code should be multi threaded, except of parts doing the 
actual IO operations, so it should be possible to run everything in one single 
ExecutionGraph's thread and just run asynchronously necessary IO operations 
with some feedback loop ("mailbox style").

I would strongly recommend fixing this issue before adding new features in the 
\{{CheckpointCoordinator}} component.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to