Explicit timeout for ipc.Client ------------------------------- Key: HADOOP-642 URL: http://issues.apache.org/jira/browse/HADOOP-642 Project: Hadoop Issue Type: Bug Affects Versions: 0.7.2 Reporter: Konstantin Shvachko
This bug contributed to the crash discussed in HADOOP-572. ipc.Client is trying to establish connection with its server with an infinite timeout. For an unknown to me reason infinity equals 3 minutes in this case. I guess it is configured somewhere in the native socket implementation. With this timeout data-nodes had only 3 chances to send heartbeats during the 10 minute expiration interval. And with a very busy name-node this makes their chances to be accepted close to 0. I included an explicit call of Socket.connect() with a timeout set to 1 min, which is our default for all connections. Modified a log message to include information that turned out to be useful for debugging. Removed unnecessary imports. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira