Hello Team,
 We are using a zookeeper cluster to serve HBase services, with three nodes 
deployed in the cluster,We found that other clients and regionserver services 
share the same session ID, which caused the regionserver to crash
Zookeeper version is: 3.4.5


Could you please help us to debug this issue?


Follower's log
 2023-07-05 02:25:08,990 INFO org.apache.zookeeper.server.ZooKeeperServer: 
Client attempting to establish new session at /10.11.1.10:51432
 2023-07-05 02:25:08,991 INFO org.apache.zookeeper.server.ZooKeeperServer: 
Established session 0xff88d1721b6c53bc with negotiated timeout 180000 for 
client   /10.11.1.10:51432
 2023-07-05 02:25:35,880 INFO org.apache.zookeeper.server.NIOServerCnxn: Closed 
socket connection for client /10.11.1.10:51432 which had sessionid 
0xff88d1721b6c53bc

0xff88d1721b6c53bc comes from the client 10.11.1.10:51432, 2023-07-05 
02:25:35880. The server closed it when it thought it timed out


master's log


 2023-07-05 02:25:35,880 INFO org.apache.zookeeper.server.PrepRequestProcessor: 
Processed session termination for sessionid: 0xff88d1721b6c53bc
 2023-07-05 02:25:35,880 INFO org.apache.zookeeper.server.NIOServerCnxn: Closed 
socket connection for client /10.11.8.18:41402 which had sessionid 
0xff88d1721b6c53bc

The client displayed in 0xff88d1721b6c53bc is 10.11.8.18:41402, 
10.11.8.18 is my regionserver service, which was restarted due to a session 
disconnection
resionserver's log
 2023-07-05 02:25:35,213 INFO 
org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL: Slow sync cost: 272 ms, 
current pipeline:   
[DatanodeInfoWithStorage[10.11.8.18:9866,DS-a43ef0f9-d9e8-4fbd-a69b-1f2aec83cb8d,DISK],
 DatanodeInfoWithStorage[10.11.1.11:9866,DS-3a9a9b9d-a405-4a1b-af87-   
9c0d38eb7fc6,DISK], 
DatanodeInfoWithStorage[10.11.8.19:9866,DS-8466a61b-419c-44e7-85e2-30d28ff16c0f,DISK]]
 2023-07-05 02:25:35,881 INFO org.apache.zookeeper.ClientCnxn: Unable to read 
additional data from server sessionid 0xff88d1721b6c53bc, likely server has 
closed socket,   closing socket connection and attempting reconnect
 2023-07-05 02:25:36,573 INFO 
org.apache.hadoop.hbase.regionserver.throttle.PressureAwareThroughputController:
   a40720868195bb1f851f94e01162801b#cf#compaction#13639 average throughput is 
29.28   MB/second, slept 0 time(s) and total slept time is 0 ms. 1 active 
operations   remaining, total limit is 69.23 MB/second
 2023-07-05 02:25:36,591 INFO org.apache.zookeeper.ClientCnxn: Opening socket 
connection to server dx-hbaseservice01.dx/10.11.39.10:2181. Will not attempt to 
  authenticate using SASL (unknown error)
 2023-07-05 02:25:36,592 INFO org.apache.zookeeper.ClientCnxn: Socket 
connection established, initiating session, client: /10.11.8.18:29508, server: 
dx-   hbaseservice01.dx/10.11.39.10:2181
 2023-07-05 02:25:36,597 WARN org.apache.zookeeper.ClientCnxn: Unable to 
reconnect to ZooKeeper service, session 0xff88d1721b6c53bc has expired
 2023-07-05 02:25:36,597 INFO org.apache.zookeeper.ClientCnxn: EventThread shut 
down for session: 0xff88d1721b6c53bc
Thanks,
Miao Wang

Reply via email to