XComp commented on pull request #19275:
URL: https://github.com/apache/flink/pull/19275#issuecomment-1085616997


   the archiving will be retriggered in case of a JobManager failover. Consider 
that the job finished globally. The following steps would happen:
   1. Archiving of the job to the `ExecutionGraphInfoStore`
   2. (optional) HistoryServer archiving is triggered
   3. JobResult is written as dirty entry to `JobResultStore`
   4. Cleanup of job-related artifacts is triggered in a retryable fashion
   5. JobResult is marked as clean in the JobResultStore
   6. The job termination future completes
   
   In this setup, the archiving only happens once. No retry is triggered. Now, 
let's assume, the jobManager would failover for whatever reason in phase 4. 
That means that the dirty entry for this job already exists in the 
JobResultStore. A failover of the JobManager would start a 
`CleanupJobManagerRunner` that will immediately complete and trigger the 
termination process (as described above) again. As a consequence, a sparse 
ArchivedExecutionGraph is archived into the `ExecutionGraphInfoStore`. That is 
ok for now because the ExecutionGraphInfoStore only lives on the JobManager 
node and is not shared outside of its scope.
   For the HistoryServer, that's not the case. It will try to trigger the 
archiving again but would probably find a the ExecutionGraph already being 
archived for that job. This will result in a failure, i.e. the archiving is not 
idempotent which is actually should be. I created FLINK-26976 to cover this.
   
   Another follow-up issue should be making the archiving also retryable. This 
isn't the case, yet, but should be desired. I would suggest fixing that as a 
separate issue to avoid increasing the PRs scope. Therefore, I created 
FLINK-26984 to cover the retrying of the archiving.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to