[
https://issues.apache.org/jira/browse/FLINK-23871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Aitozi updated FLINK-23871:
---------------------------
Description:
The exception during run recovery job will trigger fatal error which is
introduced in https://issues.apache.org/jira/browse/FLINK-9097. If a job have
reached a finished status. But crash at clean up phase or any other post phase.
When recover job, it may recover a job in
RunningJobsRegistry.JobSchedulingStatus.DONE status, this may lead to the
dispatcher fatal again.
I think we should deal with the RunningJobsRegistry.JobSchedulingStatus.DONE
with special exception like JobFinishingException, which represents the
job/master crashed in job finishing phase. And only do the clean up work for
this exception
was:
The exception during run recovery job will trigger fatal error which is
introduced in https://issues.apache.org/jira/browse/FLINK-9097. But if a job
have reached a finished status. But crash at cleap up phase or any other post
phase. When recover job, it may recover a job in
RunningJobsRegistry.JobSchedulingStatus.DONE status, this may lead to the
dispatcher fatal again.
I think we should deal with the RunningJobsRegistry.JobSchedulingStatus.DONE
with special exception like JobFinishingException, which represents the
job/master crashed in job finishing phase. And only do the clean up work for
this exception
> Dispatcher should handle finishing job exception when recover
> -------------------------------------------------------------
>
> Key: FLINK-23871
> URL: https://issues.apache.org/jira/browse/FLINK-23871
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Coordination
> Affects Versions: 1.13.2
> Reporter: Aitozi
> Priority: Major
>
> The exception during run recovery job will trigger fatal error which is
> introduced in https://issues.apache.org/jira/browse/FLINK-9097. If a job
> have reached a finished status. But crash at clean up phase or any other post
> phase. When recover job, it may recover a job in
> RunningJobsRegistry.JobSchedulingStatus.DONE status, this may lead to the
> dispatcher fatal again.
> I think we should deal with the RunningJobsRegistry.JobSchedulingStatus.DONE
> with special exception like JobFinishingException, which represents the
> job/master crashed in job finishing phase. And only do the clean up work for
> this exception
--
This message was sent by Atlassian Jira
(v8.3.4#803005)