[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16438140#comment-16438140
 ] 

Daniel C commented on ZOOKEEPER-3018:
-------------------------------------

We have another incident of same issue reported.

This session owns this ephemeral node:

0x162a60381242e0c:
        
/consumers/cosStepExecutors/ids/cosStepExecutors_1523203806120_den01fkd.us.oracle.com-1523203806187-7327e8bb

 

Doing "stat 
/consumers/cosStepExecutors/ids/cosStepExecutors_1523203806120_den01fkd.us.oracle.com-1523203806187-7327e8bb":

czxid: 40609
mzxid: 40609
pzxid: 40609
ctime: 1523536225711 (2018-04-12T05:30:25.711-0700)
mtime: 1523536225711 (2018-04-12T05:30:25.711-0700)
version: 0
cversion: 0
aversion: 0
owner: 99824675737316876
datalen: 123
children: 0

 

"cons" indicated the session is still there:

echo cons | nc den01fkd 2181 | grep 162a60381242e0c
 
/10.196.42.142:22166[1](queued=0,recved=549,sent=549,sid=0x162a60381242e0c,lop=PING,est=1523536225314,to=9000,lcxid=0x8f8c,lzxid=0xd8ed,lresp=1523665779797,llat=0,minlat=0,avglat=0,maxlat=1)

 

However, SessionTracker "dump" indicated the session is already expired:

10 expire at Sat Apr 14 00:15:08 UTC 2018:
        .....
        .....
        0x162a60381242e0c

 

It seems session states are not consistent in Zookeeper causing the issue.

[~andorm], could you please have someone to take a look as zookeeper breaks the 
ephemeral node behavior?

 

 

> Ephemeral node not deleted after session is gone
> ------------------------------------------------
>
>                 Key: ZOOKEEPER-3018
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3018
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.4.6
>         Environment: Linux 4.1.12-112.14.10.el6uek.x86_64 #2 SMP x86_64 
> GNU/Linux
>            Reporter: Daniel C
>            Priority: Major
>
> We have a live Zookeeper environment (quorum size is 2) and observed a 
> strange behavior:
>  * Kafka created 2 ephemeral nodes /brokers/ids/822712429 and 
> /brokers/ids/707577499 on 2018-03-12 03:30:36.933
>  * The Kafka clients were long gone but as of today (20+ days after), the two 
> ephemeral nodes are still present
>  
> Troubleshooting:
> 1) Lists the outstanding sessions and ephemeral nodes
>  
> {noformat}
> $ echo dump | nc $SERVER1 2181
> SessionTracker dump:
> org.apache.zookeeper.server.quorum.LearnerSessionTracker@6d7fd863
> ephemeral nodes dump:
> Sessions with Ephemerals (2):
> 0x162183ea9f70003:
>                /brokers/ids/822712429
> 0x162183ea9f70002:
>                /brokers/ids/707577499
>                /controller
> {noformat}
>  
>  
> 2) stat on /brokers/ids/822712429
>  
> {noformat}
> zk> stat /brokers/ids/822712429
> czxid: 4294967344
> mzxid: 4294967344
> pzxid: 4294967344
> ctime: 1520825436933 (2018-03-11T20:30:36.933-0700)
> mtime: 1520825436933 (2018-03-11T20:30:36.933-0700)
> version: 0
> cversion: 0
> aversion: 0
> owner: 99668799174148099
> datalen: 102
> children: 0
> {noformat}
>  
>  
> 3) List full connection/session details for all clients connected
>  
> {noformat}
> $ echo cons | nc $SERVER1 2181
>  /10.247.114.70:30401[0](queued=0,recved=1,sent=0)
>  
> /10.248.88.235:40430[1](queued=0,recved=345,sent=345,sid=0x162183ea9f70c22,lop=PING,est=1522713395028,to=40000,lcxid=0x12,lzxid=0xffffffffffffffff,lresp=1522717802117,llat=0,minlat=0,avglat=0,maxlat=31)
> {noformat}
>  
>  
>  
> {noformat}
> $ echo cons | nc $SERVER2 2181
>  /10.196.18.61:28173[0](queued=0,recved=1,sent=0)
>  
> /10.247.114.69:42679[1](queued=0,recved=73800,sent=73800,sid=0x262183eaa21da96,lop=PING,est=1522651352906,to=9000,lcxid=0xe49f,lzxid=0x10004683d,lresp=1522717854847,llat=0,minlat=0,avglat=0,maxlat=1235)
> {noformat}
>  
>  
> 4) health
>  
> {noformat}
> $ echo mntr | nc $SERVER1 2181
> zk_version           3.4.6-1569965, built on 02/20/2014 09:09 GMT
> zk_avg_latency  0
> zk_max_latency 443
> zk_min_latency  0
> zk_packets_received       11158019
> zk_packets_sent               11158244
> zk_num_alive_connections           2
> zk_outstanding_requests              0
> zk_server_state follower
> zk_znode_count               344
> zk_watch_count               0
> zk_ephemerals_count     3
> zk_approximate_data_size          36654
> zk_open_file_descriptor_count   33
> zk_max_file_descriptor_count     65536
> {noformat}
>  
>  
> 5) Server logs with related sessions:
> {noformat}
> Only found these logs from Server1 related to the sessions (0x162183ea9f70002 
> and 0x162183ea9f70003):
> 2018-03-12 03:28:35,127 [myid:1] - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - 
> Accepted socket connection from /10.196.18.60:26775
> 2018-03-12 03:28:35,131 [myid:1] - WARN  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@822] - Connection 
> request from old client /10.196.18.60:26775; will be dropped if server is in 
> r-o mode
> 2018-03-12 03:28:35,131 [myid:1] - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@868] - Client 
> attempting to establish new session at /10.196.18.60:26775
> 2018-03-12 03:28:35,137 [myid:1] - INFO  
> [CommitProcessor:1:ZooKeeperServer@617] - Established session 
> 0x162183ea9f70002 with negotiated timeout 9000 for client /10.196.18.60:26775
>  
> 2018-03-12 03:30:36,415 [myid:1] - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - 
> Accepted socket connection from /10.247.114.70:39260
> 2018-03-12 03:30:36,422 [myid:1] - WARN  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@822] - Connection 
> request from old client /10.247.114.70:39260; will be dropped if server is in 
> r-o mode
> 2018-03-12 03:30:36,423 [myid:1] - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@868] - Client 
> attempting to establish new session at /10.247.114.70:39260
> 2018-03-12 03:30:36,428 [myid:1] - INFO  
> [CommitProcessor:1:ZooKeeperServer@617] - Established session 
> 0x162183ea9f70003 with negotiated timeout 9000 for client /10.247.114.70:39260
>  
> 2018-03-31 01:29:58,865 [myid:1] - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed 
> socket connection for client /10.247.114.70:39260 which had sessionid 
> 0x162183ea9f70003{noformat}
> 6) Txn logs on the two ephemeral nodes /brokers/ids/707577499 and 
> /brokers/ids/822712429:
> {noformat}
> 3/11/18 8:28:35 PM PDT session 0x162183ea9f70002 cxid 0x6 zxid 0x10000001b 
> create '/brokers/ids,,v{s{31,s{'world,'anyone}}},F,1
> 3/11/18 8:28:35 PM PDT session 0x162183ea9f70002 cxid 0x2c zxid 0x100000028 
> create 
> '/brokers/ids/707577499,#7b226a6d785f706f7274223a31303130332c2274696d657374616d70223a2231353230383235333135363931222c22686f7374223a22736c6331336e79692e75732e6f7261636c652e636f6d222c2276657273696f6e223a312c22706f7274223a393039327d,v{s{31,s{'world,'anyone}}},T,1
> 3/11/18 8:30:36 PM PDT session 0x162183ea9f70003 cxid 0x14 zxid 0x100000030 
> create 
> '/brokers/ids/822712429,#7b226a6d785f706f7274223a31303130332c2274696d657374616d70223a2231353230383235343336393139222c22686f7374223a22736c6331336e796a2e75732e6f7261636c652e636f6d222c2276657273696f6e223a312c22706f7274223a393039327d,v{s{31,s{'world,'anyone}}},T,2{noformat}
>  
> 7) Additional questions from [~andorm]
> {noformat}
> 1) Why is the session closed, the client closed it or the cluster expired it?
> [Daniel Chan] in this case, the client got killed and we expect the session 
> would be expired by the cluster
>  
> 2) which server was the session attached to - the first (44sec max
> lat) or one of the others? Which server was the leader?
> [Daniel Chan] The sessions creating the ephemeral nodes were attached to 
> Server1 (443 max latency) while Server2 is the leader
>  
> 3) the znode exists on all 4 servers, is that right?
> [Daniel Chan] The cluster has 2 members not 4, and the ephemeral nodes are 
> present on both servers
>  {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to