[jira] [Updated] (ZOOKEEPER-3890) Ephemeral node not deleted after session is gone, then elected as leader

Lea Morschel (Jira) Tue, 14 Jul 2020 06:20:11 -0700


     [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Lea Morschel updated ZOOKEEPER-3890:
------------------------------------
    Attachment: cmdline-feedback.txt

> Ephemeral node not deleted after session is gone, then elected as leader
> ------------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-3890
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3890
>             Project: ZooKeeper
>          Issue Type: Bug
>    Affects Versions: 3.5.7
>            Reporter: Lea Morschel
>            Priority: Major
>         Attachments: cmdline-feedback.txt
>
>
> When a ZooKeeper client session disappears, the associated ephemeral node 
> that is used for leader election is occasionally not deleted and persists 
> (indefinitely, it seems).
>  This of course leads to a leader election process frequently selecting such 
> a stale node to be the leader because it is oldest, so that none of the 
> existent redundant services that take action when acquiring leadership will 
> do so.
> One of the scenarios where such a stale ephemeral node is created can be 
> triggered by force-killing both the client and  ZooKeeper server ({{kill -9 
> <pid}}>), which leads to the session being recreated after restarting the 
> server on its side, even though the actual client session is gone. This node 
> even persists after regular restarts from now on. This scenario involves a 
> single ZooKeeper server, but the problem has also been observed in a cluster 
> of three.
> When the ephemeral node is first persisted after restarting (and every 
> restart thereafter), the following is observable in the ZooKeeper server logs:
> {code:java}
> Opening datadir:/my/path snapDir:/my/path
> zookeeper.snapshot.trust.empty : true
> tickTime set to 2000
> minSessionTimeout set to 4000
> maxSessionTimeout set to 40000
> zookeeper.snapshotSizeFactor = 0.33
> Reading snapshot /my/path/version-2/snapshot.71
> Created new input stream /my/path/version-2/log.4b
> Created new input archive /my/path/version-2/log.4b
> EOF exception java.io.EOFException: Failed to read /my/path/version-2/log.4b
> Created new input stream /my/path/version-2/log.72
> Created new input archive /my/path/version-2/log.72
> Ignoring processTxn failure hdr: -1 : error: -110
> Ignoring processTxn failure hdr: -1, error: -110, path: null
> Ignoring processTxn failure hdr: -1 : error: -110
> Ignoring processTxn failure hdr: -1, error: -110, path: null
> Ignoring processTxn failure hdr: -1 : error: -110
> Ignoring processTxn failure hdr: -1, error: -110, path: null
> Ignoring processTxn failure hdr: -1 : error: -110
> Ignoring processTxn failure hdr: -1, error: -110, path: null
> Ignoring processTxn failure hdr: -1 : error: -110
> Ignoring processTxn failure hdr: -1, error: -110, path: null
> Ignoring processTxn failure hdr: -1 : error: -110
> Ignoring processTxn failure hdr: -1, error: -110, path: null
> EOF exception java.io.EOFException: Failed to read /my/path/version-2/log.72
> Snapshotting: 0x8b to /my/path/version-2/snapshot.8b
> ZKShutdownHandler is not registered, so ZooKeeper server won't take any 
> action on ERROR or SHUTDOWN server state changes
> autopurge.snapRetainCount set to 3
> autopurge.purgeInterval set to 3{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ZOOKEEPER-3890) Ephemeral node not deleted after session is gone, then elected as leader

Reply via email to