[jira] [Updated] (FLINK-21053) Prevent further RejectedExecutionExceptions in CheckpointCoordinator failing JM

Roman Khachatryan (Jira) Wed, 20 Jan 2021 02:02:04 -0800


     [ 
https://issues.apache.org/jira/browse/FLINK-21053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Roman Khachatryan updated FLINK-21053:
--------------------------------------
    Priority: Minor  (was: Major)

> Prevent further RejectedExecutionExceptions in CheckpointCoordinator failing 
> JM
> -------------------------------------------------------------------------------
>
>                 Key: FLINK-21053
>                 URL: https://issues.apache.org/jira/browse/FLINK-21053
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Checkpointing
>            Reporter: Roman Khachatryan
>            Assignee: Roman Khachatryan
>            Priority: Minor
>             Fix For: 1.13.0
>
>
> In the past, there were multiple bugs caused by throwing/handling 
> RejectedExecutionException in CheckpointCoordinator (FLINK-18290, 
> FLINK-20992).
>  
> And I think it's still possible as there are many places where an executor is 
> passed to calls to CompletableFuture.xxxAsync while it can already be shut 
> down.
>  
> In FLINK-20992 we discussed two approaches to fix this.
> One approach is to check executor state inside a synchronized block every 
> time when it is used.
> Second approach is to
>  # Create executors inside CheckpointCoordinator (both io & timer thread 
> pools)
>  # Check isShutdown() in their error handlers (if yes and it's 
> RejectedExecutionException then just log; otherwise delegate to 
> FatalExitExceptionHandler)
>  # (this will allow to remove such RejectedExecutionException checks from 
> coordinator code)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (FLINK-21053) Prevent further RejectedExecutionExceptions in CheckpointCoordinator failing JM

Reply via email to