Our ZKDashboard has been failing to read the status of the leader the past few weekends after cluster reboot. This seems to be because our reboot schedule is not causing an even distribution of sessions on restart and so the leader has a huge number of sessions, and this seems to cause the stat command to truncate the output when executed from a different data center than the leader. Eg, normally it should look like
stat Zookeeper version blah blah Clients: /... etc etc /... Latency min/avg/max: Received Sent Outstanding Zxid Mode Node count Connection closed But instead we see: stat Zookeeper version blah blah Clients: /.. .... ... /.. Connection closed I'm at a bit of a loss for how to debug this, whether it is a ZK command problem, a machine config problem, or what. Anyone have any ideas? Thanks, C
