[
https://issues.apache.org/jira/browse/FLINK-11813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Flink Jira Bot updated FLINK-11813:
-----------------------------------
Labels: stale-major (was: )
I am the [Flink Jira Bot|https://github.com/apache/flink-jira-bot/] and I help
the community manage its development. I see this issues has been marked as
Major but is unassigned and neither itself nor its Sub-Tasks have been updated
for 30 days. I have gone ahead and added a "stale-major" to the issue". If this
ticket is a Major, please either assign yourself or give an update. Afterwards,
please remove the label or in 7 days the issue will be deprioritized.
> Standby per job mode Dispatchers don't know job's JobSchedulingStatus
> ---------------------------------------------------------------------
>
> Key: FLINK-11813
> URL: https://issues.apache.org/jira/browse/FLINK-11813
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Coordination
> Affects Versions: 1.6.4, 1.7.2, 1.8.0
> Reporter: Till Rohrmann
> Priority: Major
> Labels: stale-major
> Fix For: 1.14.0
>
>
> At the moment, it can happen that standby {{Dispatchers}} in per job mode
> will restart a terminated job after they gained leadership. The problem is
> that we currently clear the {{RunningJobsRegistry}} once a job has reached a
> globally terminal state. After the leading {{Dispatcher}} terminates, a
> standby {{Dispatcher}} will gain leadership. Without having the information
> from the {{RunningJobsRegistry}} it cannot tell whether the job has been
> executed or whether the {{Dispatcher}} needs to re-execute the job. At the
> moment, the {{Dispatcher}} will assume that there was a fault and hence
> re-execute the job. This can lead to duplicate results.
> I think we need some way to tell standby {{Dispatchers}} that a certain job
> has been successfully executed. One trivial solution could be to not clean up
> the {{RunningJobsRegistry}} but then we will clutter ZooKeeper.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)