[
https://issues.apache.org/jira/browse/ZOOKEEPER-3890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17161121#comment-17161121
]
maoling commented on ZOOKEEPER-3890:
------------------------------------
1. ---> *_"One of the scenarios where such a stale ephemeral node is created
can be triggered by force-killing the ZooKeeper server ({{kill -9 <pid}}>) as
well as the client, which leads to the session being recreated after restarting
the server on its side, even though the actual client session is gone."_*
[~lemora] I cannot reproduce this issue by simply killing server and client at
the same time, then restart that server. Could you pls give me more context?
2. The following logs is suspicious
2.1 Ignoring processTxn failure hdr: -1, error: -110, path: null
2.2 ZKShutdownHandler is not registered, so ZooKeeper server won't take any
action on ERROR or SHUTDOWN server state changes
2.3 EOF exception java.io.EOFException: Failed to read /my/path/version-2/log.72
3. Look at the transaction logs you provided, found some errors:
{code:java}
20-7-14 下午04时42分08秒 session 0x100005b2a720000 cxid 0x1 zxid 0xb9 error -110
20-7-14 下午04时42分30秒 session 0x100005b2a720000 cxid 0x44 zxid 0xbd error -110
20-7-14 下午04时42分32秒 session 0x100005b2a720000 cxid 0x4c zxid 0xc0 error -110
20-7-14 下午04时42分34秒 session 0x100005b2a720000 cxid 0x57 zxid 0xc2 error -110
20-7-14 下午04时42分35秒 session 0x100005b2a720000 cxid 0x5d zxid 0xc3 error -110
{code}
> Ephemeral node not deleted after session is gone, then elected as leader
> ------------------------------------------------------------------------
>
> Key: ZOOKEEPER-3890
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3890
> Project: ZooKeeper
> Issue Type: Bug
> Affects Versions: 3.4.14, 3.5.7
> Reporter: Lea Morschel
> Priority: Major
> Attachments: cmdline-feedback.txt, zkLogsAndSnapshots.tar.xz
>
>
> When a ZooKeeper client session disappears, the associated ephemeral node
> that is used for leader election is occasionally not deleted and persists
> (indefinitely, it seems).
> A leader election process may select such a stale node to be the leader. In
> a scenario where there is a redundant service that takes action when
> acquiring leadership by means of a ZooKeeper election process, this leads to
> none of the services being active when the stale ephemeral node is elected.
> One of the scenarios where such a stale ephemeral node is created can be
> triggered by force-killing the ZooKeeper server ({{kill -9 <pid}}>) as well
> as the client, which leads to the session being recreated after restarting
> the server on its side, even though the actual client session is gone. This
> node even persists after regular restarts from now on. No pings from its
> owner-session are received, compared to an active one, yet the session never
> expires. This scenario involves a single ZooKeeper server, but the problem
> has also been observed in a cluster of three.
> When the ephemeral node is first persisted after restarting (and every
> restart thereafter), the following is observable in the ZooKeeper server
> logs. The scenario involves a local ZooKeeper server (version 3.5.7) and a
> single leader election participant.
> {code:java}
> Opening datadir:/my/path snapDir:/my/path
> zookeeper.snapshot.trust.empty : true
> tickTime set to 2000
> minSessionTimeout set to 4000
> maxSessionTimeout set to 40000
> zookeeper.snapshotSizeFactor = 0.33
> Reading snapshot /my/path/version-2/snapshot.71
> Created new input stream /my/path/version-2/log.4b
> Created new input archive /my/path/version-2/log.4b
> EOF exception java.io.EOFException: Failed to read /my/path/version-2/log.4b
> Created new input stream /my/path/version-2/log.72
> Created new input archive /my/path/version-2/log.72
> Ignoring processTxn failure hdr: -1 : error: -110
> Ignoring processTxn failure hdr: -1, error: -110, path: null
> Ignoring processTxn failure hdr: -1 : error: -110
> Ignoring processTxn failure hdr: -1, error: -110, path: null
> Ignoring processTxn failure hdr: -1 : error: -110
> Ignoring processTxn failure hdr: -1, error: -110, path: null
> Ignoring processTxn failure hdr: -1 : error: -110
> Ignoring processTxn failure hdr: -1, error: -110, path: null
> Ignoring processTxn failure hdr: -1 : error: -110
> Ignoring processTxn failure hdr: -1, error: -110, path: null
> Ignoring processTxn failure hdr: -1 : error: -110
> Ignoring processTxn failure hdr: -1, error: -110, path: null
> EOF exception java.io.EOFException: Failed to read /my/path/version-2/log.72
> Snapshotting: 0x8b to /my/path/version-2/snapshot.8b
> ZKShutdownHandler is not registered, so ZooKeeper server won't take any
> action on ERROR or SHUTDOWN server state changes
> autopurge.snapRetainCount set to 3
> autopurge.purgeInterval set to 3{code}
> Could this problem be solved by ZooKeeper checking the sessions for each
> participating node before starting a leader election?
> So far only manual intervention (removing the stale ephemeral node) seems to
> "fix" the issue temporarily.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)