[ 
https://issues.apache.org/jira/browse/SENTRY-2203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16444340#comment-16444340
 ] 

Na Li edited comment on SENTRY-2203 at 4/19/18 4:29 PM:
--------------------------------------------------------

[~akolb]
1) I found a bug in my testing code, and that is why I saw no leader elected in 
testing code, and thought I had reproduced the issue that sometimes, no leader 
is elected. After I fixed it, then the test passed without my fix. So the root 
cause of the issue is not caused by sentry. I believe now it is bug in 
zookeeper.

2) The details of leader election algorithm is described in 
http://zookeeper.apache.org/doc/r3.1.2/recipes.html#Shared+Locks. As you can 
see, when the session of a host to zookeeper is down, its znode should be 
removed based on the description. But it looks like the znode was not removed 
under some conditioned.

{code}
Leader Election

A simple way of doing leader election with ZooKeeper is to use the 
SEQUENCE|EPHEMERAL flags when creating znodes that represent "proposals" of 
clients. The idea is to have a znode, say "/election", such that each znode 
creates a child znode "/election/n_" with both flags SEQUENCE|EPHEMERAL. With 
the sequence flag, ZooKeeper automatically appends a sequence number that is 
greater that any one previously appended to a child of "/election". The process 
that created the znode with the smallest appended sequence number is the leader.

That's not all, though. It is important to watch for failures of the leader, so 
that a new client arises as the new leader in the case the current leader 
fails. A trivial solution is to have all application processes watching upon 
the current smallest znode, and checking if they are the new leader when the 
smallest znode goes away (note that the smallest znode will go away if the 
leader fails because the node is ephemeral). But this causes a herd effect: 
upon of failure of the current leader, all other processes receive a 
notification, and execute getChildren on "/election" to obtain the current list 
of children of "/election". If the number of clients is large, it causes a 
spike on the number of operations that ZooKeeper servers have to process. To 
avoid the herd effect, it is sufficient to watch for the next znode down on the 
sequence of znodes. If a client receives a notification that the znode it is 
watching is gone, then it becomes the new leader in the case that there is no 
smaller znode. Note that this avoids the herd effect by not having all clients 
watching the same znode.

Here's the pseudo code:

Let ELECTION be a path of choice of the application. To volunteer to be a 
leader:

    Create znode z with path "ELECTION/n_" with both SEQUENCE and EPHEMERAL 
flags;

    Let C be the children of "ELECTION", and i be the sequence number of z;

    Watch for changes on "ELECTION/n_j", where j is the smallest sequence 
number such that j < i and n_j is a znode in C;

Upon receiving a notification of znode deletion:

    Let C be the new set of children of ELECTION;

    If z is the smallest node in C, then execute leader procedure;

    Otherwise, watch for changes on "ELECTION/n_j", where j is the smallest 
sequence number such that j < i and n_j is a znode in C;

Note that the znode having no preceding znode on the list of children does not 
imply that the creator of this znode is aware that it is the current leader. 
Applications may consider creating a separate to znode to acknowledge that the 
leader has executed the leader procedure. 
{code}

3) The curator code below shows that the znode is created with both SEQUENCE 
and EPHEMERAL flags. So zookeeper will remove that znode when session 
terminates.
{code}
In org.apache.curator.framework.recipes.locks.StandardLockInternalsDriver at 
sentry server, create ephemeral node

  public String createsTheLock(CuratorFramework client, String path, byte[] 
lockNodeBytes) throws Exception {
    String ourPath;
    if (lockNodeBytes != null) {
      ourPath = 
(String)((ACLBackgroundPathAndBytesable)client.create().creatingParentContainersIfNeeded().withProtection().withMode(CreateMode.EPHEMERAL_SEQUENTIAL)).forPath(path,
 lockNodeBytes);
    } else {
      ourPath = 
(String)((ACLBackgroundPathAndBytesable)client.create().creatingParentContainersIfNeeded().withProtection().withMode(CreateMode.EPHEMERAL_SEQUENTIAL)).forPath(path);
    }

    return ourPath;
  }
{code}


was (Author: linaataustin):
[~akolb]
1) I found a bug in my testing code, and that is why I saw no leader elected in 
testing code, and thought I had reproduced the issue that sometimes, no leader 
is elected. After I fixed it, then the test passed without my fix. So the root 
cause of the smoke test, which is c6 blocker, is not caused by sentry. I 
believe now it is bug in zookeeper.

2) The details of leader election algorithm is described in 
http://zookeeper.apache.org/doc/r3.1.2/recipes.html#Shared+Locks. As you can 
see, when the session of a host to zookeeper is down, its znode should be 
removed based on the description. But it looks like the znode was not removed 
under some conditioned.

{code}
Leader Election

A simple way of doing leader election with ZooKeeper is to use the 
SEQUENCE|EPHEMERAL flags when creating znodes that represent "proposals" of 
clients. The idea is to have a znode, say "/election", such that each znode 
creates a child znode "/election/n_" with both flags SEQUENCE|EPHEMERAL. With 
the sequence flag, ZooKeeper automatically appends a sequence number that is 
greater that any one previously appended to a child of "/election". The process 
that created the znode with the smallest appended sequence number is the leader.

That's not all, though. It is important to watch for failures of the leader, so 
that a new client arises as the new leader in the case the current leader 
fails. A trivial solution is to have all application processes watching upon 
the current smallest znode, and checking if they are the new leader when the 
smallest znode goes away (note that the smallest znode will go away if the 
leader fails because the node is ephemeral). But this causes a herd effect: 
upon of failure of the current leader, all other processes receive a 
notification, and execute getChildren on "/election" to obtain the current list 
of children of "/election". If the number of clients is large, it causes a 
spike on the number of operations that ZooKeeper servers have to process. To 
avoid the herd effect, it is sufficient to watch for the next znode down on the 
sequence of znodes. If a client receives a notification that the znode it is 
watching is gone, then it becomes the new leader in the case that there is no 
smaller znode. Note that this avoids the herd effect by not having all clients 
watching the same znode.

Here's the pseudo code:

Let ELECTION be a path of choice of the application. To volunteer to be a 
leader:

    Create znode z with path "ELECTION/n_" with both SEQUENCE and EPHEMERAL 
flags;

    Let C be the children of "ELECTION", and i be the sequence number of z;

    Watch for changes on "ELECTION/n_j", where j is the smallest sequence 
number such that j < i and n_j is a znode in C;

Upon receiving a notification of znode deletion:

    Let C be the new set of children of ELECTION;

    If z is the smallest node in C, then execute leader procedure;

    Otherwise, watch for changes on "ELECTION/n_j", where j is the smallest 
sequence number such that j < i and n_j is a znode in C;

Note that the znode having no preceding znode on the list of children does not 
imply that the creator of this znode is aware that it is the current leader. 
Applications may consider creating a separate to znode to acknowledge that the 
leader has executed the leader procedure. 
{code}

3) The curator code below shows that the znode is created with both SEQUENCE 
and EPHEMERAL flags. So zookeeper will remove that znode when session 
terminates.
{code}
In org.apache.curator.framework.recipes.locks.StandardLockInternalsDriver at 
sentry server, create ephemeral node

  public String createsTheLock(CuratorFramework client, String path, byte[] 
lockNodeBytes) throws Exception {
    String ourPath;
    if (lockNodeBytes != null) {
      ourPath = 
(String)((ACLBackgroundPathAndBytesable)client.create().creatingParentContainersIfNeeded().withProtection().withMode(CreateMode.EPHEMERAL_SEQUENTIAL)).forPath(path,
 lockNodeBytes);
    } else {
      ourPath = 
(String)((ACLBackgroundPathAndBytesable)client.create().creatingParentContainersIfNeeded().withProtection().withMode(CreateMode.EPHEMERAL_SEQUENTIAL)).forPath(path);
    }

    return ourPath;
  }
{code}

> Leader Lock is not released when Sentry service shuts down
> ----------------------------------------------------------
>
>                 Key: SENTRY-2203
>                 URL: https://issues.apache.org/jira/browse/SENTRY-2203
>             Project: Sentry
>          Issue Type: Bug
>          Components: Sentry
>    Affects Versions: 2.1.0
>            Reporter: Na Li
>            Assignee: Na Li
>            Priority: Critical
>         Attachments: SENTRY-2203.001.patch
>
>
> In our testing for sentry HA, we found after restarting sentry service 
> without restarting zookeeper service, it is possible that none of sentry 
> servers is elected as leader to sync with HMS.
> What happened was
> 1) When a leader is elected, the sentry server host holds the leader lock. 
> The lock is identified by the mutexPath. All sentry servers in a cluster use 
> the same mutexPath.
> 2) When sentry service is shutdown, the HAContext is shutdown, so its 
> contained CuratorFrameworkImpl was shutdown, but the leader lock was still 
> hold by the sentry server host 
> 3) When the Interruption signal from shutdown caused the leader election 
> thread to be interrupted, releasing the leader lock failed because 
> CuratorFrameworkImpl was not in started state. 
> 4) When sentry server restarts, acquiring the leader lock failed because it 
> was not released. So no active sentry servers is leader. 
> 5) If releasing leader lock happened before CuratorFrameworkImpl was 
> shutdown, this issue won't happen. If restarting zookeeper after sentry 
> service restart, this issue won't happen.
> To fix this issue,
> Sentry LeaderStatusMonitor can deactivate the leader to release the leader 
> lock when it is closed, so the leader lock can be guaranteed to release 
> before CuratorFrameworkImpl is shutdown.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to