[GitHub] [flink] xintongsong commented on pull request #21137: [FLINK-29234][runtime] JobMasterServiceLeadershipRunner handle leader event in a separate executor to avoid dead lock

GitBox Thu, 03 Nov 2022 22:45:03 -0700


xintongsong commented on PR #21137:
URL: https://github.com/apache/flink/pull/21137#issuecomment-1303003420


   This is indeed a fascinating discussion.
   
   I see the point that there are some calls in `cloaseAsync` that may or may 
not run under the lock. I also agree that it would be nice the lock only covers 
the calls that need to be guarded and leave the unnecessary calls outside.
   
   I think my focus is more about preventing (or at least reducing the chances 
of) similar problems. In that sense, it's probably better that the runner does 
not need to (or as least as possible) worry about another lock from the leader 
election service. To that end, I think @XComp's proposal makes more sense than 
mine. Ideally, leader election service should not call 
`contender.grant/revokeLeadership()` under a lock while the same lock can be 
accessed by the contender via `stop()`.
   
   To clarify, fixing the problem on the service side does not justify that 
calls in `cloaseAsync` may or may not run under the lock. Ideally, `A` should 
not call `B.b()` under a lock while another method `A.a()` called by `B` can 
access the same lock. That applies to both the service and the runner, and 
maybe we should fix that for both.
   
   All in all, I'd suggest:
   - Fix the runner side issue in `closeAsync` in this PR. For this purpose 
solely, we probably don't need a dedicated executor.
   - Fix the leader election service side issue with a dedicated executor, in a 
follow up ticket. That would affect all contenders and we need to carefully 
check that no existing contenders are relying on the current behavior that 
`grant/revokeLeadership` are called under lock. We should also clean up things 
like `ResourceManagerServiceImpl.handleLeaderEventExecutor`.
   
   WDYT?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [flink] xintongsong commented on pull request #21137: [FLINK-29234][runtime] JobMasterServiceLeadershipRunner handle leader event in a separate executor to avoid dead lock

Reply via email to