[
https://issues.apache.org/jira/browse/FLINK-26391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17504212#comment-17504212
]
Matthias Pohl commented on FLINK-26391:
---------------------------------------
Thanks for the thorough testing, [~wangyang0918] and sorry for the late reply.
This looks like a bug. We would expect the second submission to fail with a
{{DuplicateJobSubmissionException}}. I created FLINK-26583 to cover this.
> Release Testing: Application Mode recovery does not re-trigger a job which
> failed during cleanup (FLINK-11813)
> --------------------------------------------------------------------------------------------------------------
>
> Key: FLINK-26391
> URL: https://issues.apache.org/jira/browse/FLINK-26391
> Project: Flink
> Issue Type: Improvement
> Components: Runtime / Coordination
> Affects Versions: 1.15.0
> Reporter: Matthias Pohl
> Assignee: Yang Wang
> Priority: Blocker
> Labels: release-testing
> Fix For: 1.15.0
>
>
> FLINK-11813 is about not being able to determine whether a job has been
> terminated globally before a failover happened. Testing this behavior can be
> achieved by running a job in HA mode to enable the file-based
> {{JobResultStore}} (JRS).
> You can specify
> [job-result-store.storage-path|https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#job-result-store-storage-path]
> to point to a directory which you can access.
> [job-result-store.delete-on-commit|https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#job-result-store-delete-on-commit]
> can be used to make the JRS artifacts not being deleted after a job finished.
> You can make a job finish to generate a the JRS artifact for this job in the
> specified directory. Renaming the generated file from {{<job-id>.json}} to
> {{<job-id>_DIRTY.json}} will simulate the job not being cleaned up properly.
> Starting the job in application mode once more (through specifying the
> corresponding Job ID) should lead to the job not being started again (you
> might want to enable {{debug}} logging to verify the logs), i.e.:
> * Cleanup should be performed.
> * No JobMaster-related logs should appear in the Flink logs.
> * cleanup-related logs should appear in the Flink logs.
> * At the end, the {{_DIRTY.json}} file extension should have been removed
> from the JRS artifact again
--
This message was sent by Atlassian Jira
(v8.20.1#820001)