[ https://issues.apache.org/jira/browse/KAFKA-4634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15823181#comment-15823181 ]
Maharajan Shunmuga Sundaram commented on KAFKA-4634: ---------------------------------------------------- Here is the history of node 211 (ez6) connected to zookeeper zk.log.1:[2017-01-13 00:23:03,076] INFO Session establishment complete on server cz1/10.254.2.19:2181, sessionid = 0x159963f352b0004, negotiated timeout = 6000 (org.apache.zookeeper.ClientCnxn) zk.log.1:[2017-01-13 00:28:32,492] INFO Session establishment complete on server cz1/10.254.2.19:2181, sessionid = 0x159963f352b0004, negotiated timeout = 6000 (org.apache.zookeeper.ClientCnxn) zk.log.1:[2017-01-13 00:29:15,234] INFO Session establishment complete on server cz2/10.254.2.29:2181, sessionid = 0x159963f352b0004, negotiated timeout = 6000 (org.apache.zookeeper.ClientCnxn) zk.log.1:[2017-01-13 00:29:54,491] INFO Session establishment complete on server cz/10.254.2.1:2182, sessionid = 0x159963f352b0004, negotiated timeout = 6000 (org.apache.zookeeper.ClientCnxn) zk.log.1:[2017-01-13 00:52:24,093] INFO Session establishment complete on server cz1/10.254.2.19:2181, sessionid = 0x159964e1e060013, negotiated timeout = 6000 (org.apache.zookeeper.ClientCnxn) zk.log.1:[2017-01-13 00:57:31,991] INFO Session establishment complete on server cz1/10.254.2.19:2181, sessionid = 0x159966708470000, negotiated timeout = 6000 (org.apache.zookeeper.ClientCnxn) zk.log.1:[2017-01-13 00:57:43,908] INFO Session establishment complete on server cz/10.254.2.1:2182, sessionid = 0x159966708470000, negotiated timeout = 6000 (org.apache.zookeeper.ClientCnxn) zk.log.1:[2017-01-13 01:05:21,130] INFO Session establishment complete on server cz2/10.254.2.29:2181, sessionid = 0x159966708470000, negotiated timeout = 6000 (org.apache.zookeeper.ClientCnxn) zk.log.1:[2017-01-13 01:05:28,513] INFO Session establishment complete on server cz1/10.254.2.19:2181, sessionid = 0x159966e47760000, negotiated timeout = 6000 (org.apache.zookeeper.ClientCnxn) zk.log.1:[2017-01-13 01:35:52,068] INFO Session establishment complete on server cz/10.254.2.1:2182, sessionid = 0x159966e47760000, negotiated timeout = 6000 (org.apache.zookeeper.ClientCnxn) zk.log.1:[2017-01-13 01:40:22,562] INFO Session establishment complete on server cz2/10.254.2.29:2181, sessionid = 0x159966e47760000, negotiated timeout = 6000 (org.apache.zookeeper.ClientCnxn) For the last session 0x159966e47760000, ez6 (Node: 211) has connected three times with different zookeepers. If I see stats on zookeeper, I do see stats for connected 211. /10.254.2.211:45260[1](queued=0,recved=186033,sent=186033) In CZ2, >>> netstat -tulpan | grep 45260 tcp6 0 0 10.254.2.29:2181 10.254.2.211:45260 ESTABLISHED 3585/java In EZ6, >>> netstat -tulpan | grep 45260 tcp6 0 0 10.254.2.211:45260 10.254.2.29:2181 ESTABLISHED 3684/java ------- If I see connection /10.254.2.211:45260[1](queued=0,recved=186457,sent=186457,sid=0x159966e47760000,lop=PING,est=1484289622561,to=6000,lcxid=0x14740,lzxid=0x70001440d,lresp=1484496803926,llat=1,minlat=0,avglat=0,maxlat=32) It lists session established time which matches broker log. 1484289622561 - Friday, January 13, 2017 1:40:22 AM Last response time matches recent time 1484496803926 - Sunday, January 15, 2017 11:13:23 AM But the session doesn't show up with broker id 211. Not sure what is the issue. Any help is appreciated. ls /brokers/ids [212, 210, 112, 113, 213, 19, 110, 111, 29] >> echo dump | nc cz2 2181 SessionTracker dump: Session Sets (4): 0 expire at Sun Jan 15 11:22:40 EST 2017: 0 expire at Sun Jan 15 11:22:42 EST 2017: 8 expire at Sun Jan 15 11:22:44 EST 2017: 0x259968e41e30000 0x35996670d5d0000 0x159968e41dd0000 0x159966708470004 0x159966e47760000 0x159966708470003 0x2599672df260000 0x259966708550000 2 expire at Sun Jan 15 11:22:46 EST 2017: 0x35996670d5d0001 0x259966708550001 ephemeral nodes dump: Sessions with Ephemerals (9): 0x259966708550000: /brokers/ids/112 0x259968e41e30000: /brokers/ids/213 0x159968e41dd0000: /brokers/ids/19 0x159966708470003: /brokers/ids/110 0x35996670d5d0000: /brokers/ids/113 /controller 0x259966708550001: /brokers/ids/111 0x159966708470004: /brokers/ids/212 0x2599672df260000: /brokers/ids/29 0x35996670d5d0001: /brokers/ids/210 > Issue of one kafka brokers not listed in zookeeper > -------------------------------------------------- > > Key: KAFKA-4634 > URL: https://issues.apache.org/jira/browse/KAFKA-4634 > Project: Kafka > Issue Type: Bug > Components: core > Affects Versions: 0.8.2.1 > Environment: Zookeeper version: 3.4.6-1569965, built on 02/20/2014 > 09:09 GMT > kafka_2.10-0.8.2.1 > Reporter: Maharajan Shunmuga Sundaram > > Hi, > We have incident that one of the 10 brokers not listed in brokers list of > zookeeper. > This is verified by running following command > >> echo dump | nc cz2 2181 > SessionTracker dump: > Session Sets (4): > 0 expire at Fri Jan 13 22:32:14 EST 2017: > 0 expire at Fri Jan 13 22:32:16 EST 2017: > 7 expire at Fri Jan 13 22:32:18 EST 2017: > 0x259968e41e30000 > 0x35996670d5d0001 > 0x35996670d5d0000 > 0x159966708470004 > 0x159966e47760000 > 0x159966708470003 > 0x2599672df260000 > 3 expire at Fri Jan 13 22:32:20 EST 2017: > 0x159968e41dd0000 > 0x259966708550001 > 0x259966708550000 > ephemeral nodes dump: > Sessions with Ephemerals (9): > 0x259966708550000: > /brokers/ids/112 > 0x259968e41e30000: > /brokers/ids/213 > 0x159968e41dd0000: > /brokers/ids/19 > 0x159966708470003: > /brokers/ids/110 > 0x35996670d5d0000: > /brokers/ids/113 > /controller > 0x259966708550001: > /brokers/ids/111 > 0x159966708470004: > /brokers/ids/212 > 0x2599672df260000: > /brokers/ids/29 > 0x35996670d5d0001: > /brokers/ids/210 > ------ > There are 10 sessions, but only 9 sessions are listed with brokers. > Broker with id 211 is not listed. Session 0x159966e47760000 is not shown with > broker id 211. > In the broker side log, I do see it is connected > >> zgrep "0x159966e47760000" *log* > > zk.log:[2017-01-13 01:05:28,513] INFO Session establishment complete on > server cz1/10.254.2.19:2181, sessionid = 0x159966e47760000, negotiated > timeout = 6000 (org.apache.zookeeper.ClientCnxn) > zk.log:[2017-01-13 01:35:38,163] INFO Unable to read additional data from > server sessionid 0x159966e47760000, likely server has closed socket, closing > socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn) > zk.log:[2017-01-13 01:35:39,101] WARN Session 0x159966e47760000 for server > null, unexpected error, closing socket connection and attempting reconnect > (org.apache.zookeeper.ClientCnxn) > zk.log:[2017-01-13 01:35:40,121] INFO Unable to read additional data from > server sessionid 0x159966e47760000, likely server has closed socket, closing > socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn) > zk.log:[2017-01-13 01:35:41,770] WARN Session 0x159966e47760000 for server > null, unexpected error, closing socket connection and attempting reconnect > (org.apache.zookeeper.ClientCnxn) > zk.log:[2017-01-13 01:35:42,439] WARN Session 0x159966e47760000 for server > null, unexpected error, closing socket connection and attempting reconnect > (org.apache.zookeeper.ClientCnxn) > zk.log:[2017-01-13 01:35:43,235] INFO Unable to read additional data from > server sessionid 0x159966e47760000, likely server has closed socket, closing > socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn) > zk.log:[2017-01-13 01:35:44,950] WARN Session 0x159966e47760000 for server > null, unexpected error, closing socket connection and attempting reconnect > (org.apache.zookeeper.ClientCnxn) > zk.log:[2017-01-13 01:35:45,837] INFO Unable to read additional data from > server sessionid 0x159966e47760000, likely server has closed socket, closing > socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn) > . > . > . > . > zk.log:[2017-01-13 01:40:14,818] INFO Unable to read additional data from > server sessionid 0x159966e47760000, likely server has closed socket, closing > socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn) > zk.log:[2017-01-13 01:40:15,916] WARN Session 0x159966e47760000 for server > null, unexpected error, closing socket connection and attempting reconnect > (org.apache.zookeeper.ClientCnxn) > zk.log:[2017-01-13 01:40:19,692] INFO Client session timed out, have not > heard from server in 3676ms for sessionid 0x159966e47760000, closing socket > connection and attempting reconnect (org.apache.zookeeper.ClientCnxn) > zk.log:[2017-01-13 01:40:20,632] INFO Unable to read additional data from > server sessionid 0x159966e47760000, likely server has closed socket, closing > socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn) > zk.log:[2017-01-13 01:40:20,814] WARN Session 0x159966e47760000 for server > null, unexpected error, closing socket connection and attempting reconnect > (org.apache.zookeeper.ClientCnxn) > zk.log:[2017-01-13 01:40:22,089] WARN Session 0x159966e47760000 for server > null, unexpected error, closing socket connection and attempting reconnect > (org.apache.zookeeper.ClientCnxn) > zk.log:[2017-01-13 01:40:22,562] INFO Session establishment complete on > server cz2/10.254.2.29:2181, sessionid = 0x159966e47760000, negotiated > timeout = 6000 (org.apache.zookeeper.ClientCnxn) > After several tries, broker connected with zookeeper cz2:2181. I am not sure > how to debug this issue. It would be helpful if someone provides way to debug > this issue. > Regards, > Maharajan S -- This message was sent by Atlassian JIRA (v6.3.4#6332)