XComp commented on pull request #19275: URL: https://github.com/apache/flink/pull/19275#issuecomment-1085616997
the archiving will be retriggered in case of a JobManager failover. Consider that the job finished globally. The following steps would happen: 1. Archiving of the job to the `ExecutionGraphInfoStore` 2. (optional) HistoryServer archiving is triggered 3. JobResult is written as dirty entry to `JobResultStore` 4. Cleanup of job-related artifacts is triggered in a retryable fashion 5. JobResult is marked as clean in the JobResultStore 6. The job termination future completes In this setup, the archiving only happens once. No retry is triggered. Now, let's assume, the jobManager would failover for whatever reason in phase 4. That means that the dirty entry for this job already exists in the JobResultStore. A failover of the JobManager would start a `CleanupJobManagerRunner` that will immediately complete and trigger the termination process (as described above) again. As a consequence, a sparse ArchivedExecutionGraph is archived into the `ExecutionGraphInfoStore`. That is ok for now because the ExecutionGraphInfoStore only lives on the JobManager node and is not shared outside of its scope. For the HistoryServer, that's not the case. It will try to trigger the archiving again but would probably find a the ExecutionGraph already being archived for that job. This will result in a failure, i.e. the archiving is not idempotent which is actually should be. I created FLINK-26976 to cover this. Another follow-up issue should be making the archiving also retryable. This isn't the case, yet, but should be desired. I would suggest fixing that as a separate issue to avoid increasing the PRs scope. Therefore, I created FLINK-26984 to cover the retrying of the archiving. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
