[ 
https://issues.apache.org/jira/browse/FLINK-30195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weijie Guo updated FLINK-30195:
-------------------------------
    Description: As discussed in 
[https://github.com/apache/flink/pull/21137|https://github.com/apache/flink/pull/21137,]
 , leader election service should not call `contender#grant/revokeLeadership` 
under a lock while the same lock can be accessed by the contender. One proposal 
to fix this issue is introduce a dedicated executor to get rid of the nested 
lock structure. This would affect all contenders and we need to carefully check 
that no existing contenders are relying on the current behavior that 
`grant/removeLeadership{{{}`{}}} are called under lock. We should also clean up 
things like `ResourceManagerServiceImpl.handleLeaderEventExecutor`. Any other 
better suggestions are also welcome.  (was: As discussed in 
[https://github.com/apache/flink/pull/21137|https://github.com/apache/flink/pull/21137,]
 , leader election service should not call `contender#grant/revokeLeadership` 
under a lock while the same lock can be accessed by the contender. We can fix 
this issue with a dedicated executor to get rid of the nested lock structure. 
This would affect all contenders and we need to carefully check that no 
existing contenders are relying on the current behavior that 
`grant/removeLeadership{{{}`{}}} are called under lock. We should also clean up 
things like `ResourceManagerServiceImpl.handleLeaderEventExecutor`.)

> LeaderElectionService should avoid potential deadlock with leaderContender
> --------------------------------------------------------------------------
>
>                 Key: FLINK-30195
>                 URL: https://issues.apache.org/jira/browse/FLINK-30195
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Coordination
>            Reporter: Weijie Guo
>            Priority: Major
>
> As discussed in 
> [https://github.com/apache/flink/pull/21137|https://github.com/apache/flink/pull/21137,]
>  , leader election service should not call `contender#grant/revokeLeadership` 
> under a lock while the same lock can be accessed by the contender. One 
> proposal to fix this issue is introduce a dedicated executor to get rid of 
> the nested lock structure. This would affect all contenders and we need to 
> carefully check that no existing contenders are relying on the current 
> behavior that `grant/removeLeadership{{{}`{}}} are called under lock. We 
> should also clean up things like 
> `ResourceManagerServiceImpl.handleLeaderEventExecutor`. Any other better 
> suggestions are also welcome.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to