[
https://issues.apache.org/jira/browse/SENTRY-2203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16441700#comment-16441700
]
Na Li commented on SENTRY-2203:
-------------------------------
from https://curator.apache.org/curator-recipes/leader-election.html,
"IMPORTANT: The recommended action for receiving SUSPENDED or LOST is to throw
CancelLeadershipException. This will cause the LeaderSelector instance to
attempt to interrupt and cancel the thread that is executing the takeLeadership
method. Because this is so important, you should consider extending
LeaderSelectorListenerAdapter. LeaderSelectorListenerAdapter has the
recommended handling already written for you."
I suspect if the thread suddenly disappear, zookeeper will detect it and clean
up. However, if the curator framework is closed, it is likely notify zookeeper,
and therefore zookeeper does not clean up what's left from that thread. We need
to check the code to be sure.
> Leader Lock is not released when Sentry service shuts down
> ----------------------------------------------------------
>
> Key: SENTRY-2203
> URL: https://issues.apache.org/jira/browse/SENTRY-2203
> Project: Sentry
> Issue Type: Bug
> Components: Sentry
> Affects Versions: 2.1.0
> Reporter: Na Li
> Assignee: Na Li
> Priority: Critical
> Attachments: SENTRY-2203.001.patch
>
>
> In our testing for sentry HA, we found after restarting sentry service
> without restarting zookeeper service, it is possible that none of sentry
> servers is elected as leader to sync with HMS.
> What happened was
> 1) When a leader is elected, the sentry server host holds the leader lock.
> The lock is identified by the mutexPath. All sentry servers in a cluster use
> the same mutexPath.
> 2) When sentry service is shutdown, the HAContext is shutdown, so its
> contained CuratorFrameworkImpl was shutdown, but the leader lock was still
> hold by the sentry server host
> 3) When the Interruption signal from shutdown caused the leader election
> thread to be interrupted, releasing the leader lock failed because
> CuratorFrameworkImpl was not in started state.
> 4) When sentry server restarts, acquiring the leader lock failed because it
> was not released. So no active sentry servers is leader.
> 5) If releasing leader lock happened before CuratorFrameworkImpl was
> shutdown, this issue won't happen. If restarting zookeeper after sentry
> service restart, this issue won't happen.
> To fix this issue,
> Sentry LeaderStatusMonitor can deactivate the leader to release the leader
> lock when it is closed, so the leader lock can be guaranteed to release
> before CuratorFrameworkImpl is shutdown.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)