sidharta seethana created MAPREDUCE-6156: --------------------------------------------
Summary: Fetcher - connect() doesn't handle connection refused correctly Key: MAPREDUCE-6156 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6156 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: sidharta seethana Assignee: Junping Du Priority: Critical The connect() function in the fetcher assumes that whenever an IOException is thrown, the amount of time passed equals "connectionTimeout" ( see code snippet below ). This is incorrect. For example, in case the NM is down, an ConnectException is thrown immediately - and the catch block assumes a minute has passed when it is not the case. {code} if (connectionTimeout < 0) { throw new IOException("Invalid timeout " + "[timeout = " + connectionTimeout + " ms]"); } else if (connectionTimeout > 0) { unit = Math.min(UNIT_CONNECT_TIMEOUT, connectionTimeout); } // set the connect timeout to the unit-connect-timeout connection.setConnectTimeout(unit); while (true) { try { connection.connect(); break; } catch (IOException ioe) { // update the total remaining connect-timeout connectionTimeout -= unit; // throw an exception if we have waited for timeout amount of time // note that the updated value if timeout is used here if (connectionTimeout == 0) { throw ioe; } // reset the connect timeout for the last try if (connectionTimeout < unit) { unit = connectionTimeout; // reset the connect time out for the final connect connection.setConnectTimeout(unit); } } } {code] -- This message was sent by Atlassian JIRA (v6.3.4#6332)