Github user StephanEwen commented on the issue:
https://github.com/apache/flink/pull/3385
With the problem observed above, I think we should change the approach a
bit:
- The registry should have an enum that it returns:
`getJobSchedulingStatus` or so, which can be `PENDING`, `RUNNING`, and `DONE`.
That way there is only one access to the registry and we don't have the problem
that the internal status is changed between checks.
- The file-based registry would create one file for the transition to
`RUNNING` and another for the transition to `DONE`. Important is that the
transition to `DONE` does not remove the file for `RUNNING`. The status check
checks backwards - first for the `DONE` file, then for the `RUNNING` file.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---