JiangJiafu created ZOOKEEPER-2701:
-------------------------------------
Summary: Timeout for RecvWorker is too long
Key: ZOOKEEPER-2701
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2701
Project: ZooKeeper
Issue Type: Bug
Affects Versions: 3.4.8
Environment: Centos6.5
ZooKeeper 3.4.8
Reporter: JiangJiafu
Priority: Minor
Environment:
I deploy ZooKeeper in a cluster of three nodes. Each node has three network
interfaces(eth0, eth1, eth2).
Hostname is used instead of IP address in zoo.cfg, and quorumListenOnAllIPs=true
Probleam:
I start three ZooKeeper servers( node A, node B, and node C) one by one,
when the leader election finishes, node B is the leader.
Then I shutdown one network interface of node A by command "ifdown eth0". The
ZooKeeper server on node A will lost connection to node B and node C. In my
test, I will take about 20 minites that the ZooKeepr server of node A realizes
the event and try to call the QuorumServer.recreateSocketAddress the resolve
the hostname.
I try to read the source code, and I find the code in
{code:title=QuorumCnxManager.java:|borderStyle=solid}
class RecvWorker extends ZooKeeperThread {
Long sid;
Socket sock;
volatile boolean running = true;
final DataInputStream din;
final SendWorker sw;
RecvWorker(Socket sock, DataInputStream din, Long sid, SendWorker sw) {
super("RecvWorker:" + sid);
this.sid = sid;
this.sock = sock;
this.sw = sw;
this.din = din;
try {
// OK to wait until socket disconnects while reading.
sock.setSoTimeout(0);
} catch (IOException e) {
LOG.error("Error while accessing socket for " + sid, e);
closeSocket(sock);
running = false;
}
}
...
}
{code}
I notice that the soTime is set to 0 in RecvWorker constructor. I think this is
reasonable when the IP address of a ZooKeeper server never change, but
considering that the IP address of each ZooKeeper server may change, maybe we
should better set a timeout here.
I am not pretty sure this is really a problem.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)