[
https://issues.apache.org/jira/browse/FLINK-12183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Till Rohrmann closed FLINK-12183.
---------------------------------
Resolution: Duplicate
Thanks for reporting this issue. It has already been fixed with FLINK-12247.
> Job Cluster doesn't stop after cancel a running job in per-job Yarn mode
> ------------------------------------------------------------------------
>
> Key: FLINK-12183
> URL: https://issues.apache.org/jira/browse/FLINK-12183
> Project: Flink
> Issue Type: Bug
> Components: Runtime / REST
> Affects Versions: 1.6.4, 1.7.2, 1.8.0
> Reporter: Yumeng Zhang
> Priority: Major
> Labels: pull-request-available
> Time Spent: 10m
> Remaining Estimate: 0h
>
> The per-job Yarn cluster doesn't stop after cancel a running job if the job
> restarted many times, like 1000 times, in a short time.
> The bug is in archiveExecutionGraph() phase before executing
> removeJobAndRegisterTerminationFuture(). The CompletableFuture thread will
> exit unexpectedly with NullPointerException in archiveExecutionGraph() phase.
> It's hard to find that because here it only catches IOException. In
> SubtaskExecutionAttemptDetailsHandler and
> SubtaskExecutionAttemptAccumulatorsHandler, when calling
> archiveJsonWithPath() method, it will construct some json information about
> prior execution attempts but the index is from 0 which might be dropped index
> for the for loop. In default, it will return null when trying to get the
> prior execution attempt (AccessExecution attempt =
> subtask.getPriorExecutionAttempt(x)).
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)