zentol commented on pull request #18237: URL: https://github.com/apache/flink/pull/18237#issuecomment-1031270503
What happens if the new TM offers its slots before the loss of the old TM is noticed? The slot pool will treat the offer slots as duplicate registrations (since the AllocationIDs match) and ignore them, but that means the taskManagerLocation will be incorrect (and thus they will be unusable). If the job then notices the TM loss and fails the job, all the slots belonging to the old TM will be freed. At that point, the JM lacks slots, the TM thinks they are allocated to the job (after all the offer was accepted), as does the RM (slot reports from TM). But the JM is just waiting for slots to arrive? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
