Daniel C created ZOOKEEPER-3018:
-----------------------------------

             Summary: Ephemeral node not deleted after session is gone
                 Key: ZOOKEEPER-3018
                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3018
             Project: ZooKeeper
          Issue Type: Bug
          Components: server
    Affects Versions: 3.4.6
         Environment: Linux 4.1.12-112.14.10.el6uek.x86_64 #2 SMP x86_64 
GNU/Linux
            Reporter: Daniel C


We have a live Zookeeper environment (quorum size is 2) and observed a strange 
behavior:
 * Kafka created 2 ephemeral nodes /brokers/ids/822712429 and 
/brokers/ids/707577499 on 2018-03-12 03:30:36.933
 * The Kafka clients were long gone but as of today (20+ days after), the two 
ephemeral nodes are still present

 

Troubleshooting:

1) Lists the outstanding sessions and ephemeral nodes

 
{noformat}
$ echo dump | nc $SERVER1 2181
SessionTracker dump:
org.apache.zookeeper.server.quorum.LearnerSessionTracker@6d7fd863
ephemeral nodes dump:
Sessions with Ephemerals (2):
0x162183ea9f70003:
               /brokers/ids/822712429
0x162183ea9f70002:
               /brokers/ids/707577499
               /controller
{noformat}
 

 

2) stat on /brokers/ids/822712429

 
{noformat}
zk> stat /brokers/ids/822712429
czxid: 4294967344
mzxid: 4294967344
pzxid: 4294967344
ctime: 1520825436933 (2018-03-11T20:30:36.933-0700)
mtime: 1520825436933 (2018-03-11T20:30:36.933-0700)
version: 0
cversion: 0
aversion: 0
owner: 99668799174148099
datalen: 102
children: 0
{noformat}
 

 

3) List full connection/session details for all clients connected

 
{noformat}
$ echo cons | nc $SERVER1 2181
 /10.247.114.70:30401[0](queued=0,recved=1,sent=0)
 
/10.248.88.235:40430[1](queued=0,recved=345,sent=345,sid=0x162183ea9f70c22,lop=PING,est=1522713395028,to=40000,lcxid=0x12,lzxid=0xffffffffffffffff,lresp=1522717802117,llat=0,minlat=0,avglat=0,maxlat=31)
{noformat}
 

 

 
{noformat}
$ echo cons | nc $SERVER2 2181
 /10.196.18.61:28173[0](queued=0,recved=1,sent=0)
 
/10.247.114.69:42679[1](queued=0,recved=73800,sent=73800,sid=0x262183eaa21da96,lop=PING,est=1522651352906,to=9000,lcxid=0xe49f,lzxid=0x10004683d,lresp=1522717854847,llat=0,minlat=0,avglat=0,maxlat=1235)
{noformat}
 

 

4) health

 
{noformat}
$ echo mntr | nc $SERVER1 2181
zk_version           3.4.6-1569965, built on 02/20/2014 09:09 GMT
zk_avg_latency  0
zk_max_latency 443
zk_min_latency  0
zk_packets_received       11158019
zk_packets_sent               11158244
zk_num_alive_connections           2
zk_outstanding_requests              0
zk_server_state follower
zk_znode_count               344
zk_watch_count               0
zk_ephemerals_count     3
zk_approximate_data_size          36654
zk_open_file_descriptor_count   33
zk_max_file_descriptor_count     65536
{noformat}
 

 

5) Server logs with related sessions:
{noformat}
Only found these logs from Server1 related to the sessions (0x162183ea9f70002 
and 0x162183ea9f70003):

2018-03-12 03:28:35,127 [myid:1] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted 
socket connection from /10.196.18.60:26775

2018-03-12 03:28:35,131 [myid:1] - WARN  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@822] - Connection 
request from old client /10.196.18.60:26775; will be dropped if server is in 
r-o mode

2018-03-12 03:28:35,131 [myid:1] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@868] - Client 
attempting to establish new session at /10.196.18.60:26775

2018-03-12 03:28:35,137 [myid:1] - INFO  
[CommitProcessor:1:ZooKeeperServer@617] - Established session 0x162183ea9f70002 
with negotiated timeout 9000 for client /10.196.18.60:26775

 
2018-03-12 03:30:36,415 [myid:1] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted 
socket connection from /10.247.114.70:39260

2018-03-12 03:30:36,422 [myid:1] - WARN  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@822] - Connection 
request from old client /10.247.114.70:39260; will be dropped if server is in 
r-o mode

2018-03-12 03:30:36,423 [myid:1] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@868] - Client 
attempting to establish new session at /10.247.114.70:39260

2018-03-12 03:30:36,428 [myid:1] - INFO  
[CommitProcessor:1:ZooKeeperServer@617] - Established session 0x162183ea9f70003 
with negotiated timeout 9000 for client /10.247.114.70:39260

 
2018-03-31 01:29:58,865 [myid:1] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket 
connection for client /10.247.114.70:39260 which had sessionid 
0x162183ea9f70003{noformat}
6) Txn logs on the two ephemeral nodes /brokers/ids/707577499 and 
/brokers/ids/822712429:
{noformat}
3/11/18 8:28:35 PM PDT session 0x162183ea9f70002 cxid 0x6 zxid 0x10000001b 
create '/brokers/ids,,v{s{31,s{'world,'anyone}}},F,1

3/11/18 8:28:35 PM PDT session 0x162183ea9f70002 cxid 0x2c zxid 0x100000028 
create 
'/brokers/ids/707577499,#7b226a6d785f706f7274223a31303130332c2274696d657374616d70223a2231353230383235333135363931222c22686f7374223a22736c6331336e79692e75732e6f7261636c652e636f6d222c2276657273696f6e223a312c22706f7274223a393039327d,v{s{31,s{'world,'anyone}}},T,1

3/11/18 8:30:36 PM PDT session 0x162183ea9f70003 cxid 0x14 zxid 0x100000030 
create 
'/brokers/ids/822712429,#7b226a6d785f706f7274223a31303130332c2274696d657374616d70223a2231353230383235343336393139222c22686f7374223a22736c6331336e796a2e75732e6f7261636c652e636f6d222c2276657273696f6e223a312c22706f7274223a393039327d,v{s{31,s{'world,'anyone}}},T,2{noformat}
 

7) Additional questions from [~andorm]
{noformat}
1) Why is the session closed, the client closed it or the cluster expired it?

[Daniel Chan] in this case, the client got killed and we expect the session 
would be expired by the cluster

 
2) which server was the session attached to - the first (44sec max

lat) or one of the others? Which server was the leader?

[Daniel Chan] The sessions creating the ephemeral nodes were attached to 
Server1 (443 max latency) while Server2 is the leader

 
3) the znode exists on all 4 servers, is that right?

[Daniel Chan] The cluster has 2 members not 4, and the ephemeral nodes are 
present on both servers

 {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to