[ https://issues.apache.org/jira/browse/MAPREDUCE-6156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
sidharta seethana updated MAPREDUCE-6156: ----------------------------------------- Description: The connect() function in the fetcher assumes that whenever an IOException is thrown, the amount of time passed equals "connectionTimeout" ( see code snippet below ). This is incorrect. For example, in case the NM is down, an ConnectException is thrown immediately - and the catch block assumes a minute has passed when it is not the case. {code} if (connectionTimeout < 0) { throw new IOException("Invalid timeout " + "[timeout = " + connectionTimeout + " ms]"); } else if (connectionTimeout > 0) { unit = Math.min(UNIT_CONNECT_TIMEOUT, connectionTimeout); } // set the connect timeout to the unit-connect-timeout connection.setConnectTimeout(unit); while (true) { try { connection.connect(); break; } catch (IOException ioe) { // update the total remaining connect-timeout connectionTimeout -= unit; // throw an exception if we have waited for timeout amount of time // note that the updated value if timeout is used here if (connectionTimeout == 0) { throw ioe; } // reset the connect timeout for the last try if (connectionTimeout < unit) { unit = connectionTimeout; // reset the connect time out for the final connect connection.setConnectTimeout(unit); } } } {code} was: The connect() function in the fetcher assumes that whenever an IOException is thrown, the amount of time passed equals "connectionTimeout" ( see code snippet below ). This is incorrect. For example, in case the NM is down, an ConnectException is thrown immediately - and the catch block assumes a minute has passed when it is not the case. {code} if (connectionTimeout < 0) { throw new IOException("Invalid timeout " + "[timeout = " + connectionTimeout + " ms]"); } else if (connectionTimeout > 0) { unit = Math.min(UNIT_CONNECT_TIMEOUT, connectionTimeout); } // set the connect timeout to the unit-connect-timeout connection.setConnectTimeout(unit); while (true) { try { connection.connect(); break; } catch (IOException ioe) { // update the total remaining connect-timeout connectionTimeout -= unit; // throw an exception if we have waited for timeout amount of time // note that the updated value if timeout is used here if (connectionTimeout == 0) { throw ioe; } // reset the connect timeout for the last try if (connectionTimeout < unit) { unit = connectionTimeout; // reset the connect time out for the final connect connection.setConnectTimeout(unit); } } } {code] > Fetcher - connect() doesn't handle connection refused correctly > ---------------------------------------------------------------- > > Key: MAPREDUCE-6156 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6156 > Project: Hadoop Map/Reduce > Issue Type: Bug > Reporter: sidharta seethana > Assignee: Junping Du > Priority: Critical > > The connect() function in the fetcher assumes that whenever an IOException is > thrown, the amount of time passed equals "connectionTimeout" ( see code > snippet below ). This is incorrect. For example, in case the NM is down, an > ConnectException is thrown immediately - and the catch block assumes a minute > has passed when it is not the case. > {code} > if (connectionTimeout < 0) { > throw new IOException("Invalid timeout " > + "[timeout = " + connectionTimeout + " ms]"); > } else if (connectionTimeout > 0) { > unit = Math.min(UNIT_CONNECT_TIMEOUT, connectionTimeout); > } > // set the connect timeout to the unit-connect-timeout > connection.setConnectTimeout(unit); > while (true) { > try { > connection.connect(); > break; > } catch (IOException ioe) { > // update the total remaining connect-timeout > connectionTimeout -= unit; > // throw an exception if we have waited for timeout amount of time > // note that the updated value if timeout is used here > if (connectionTimeout == 0) { > throw ioe; > } > // reset the connect timeout for the last try > if (connectionTimeout < unit) { > unit = connectionTimeout; > // reset the connect time out for the final connect > connection.setConnectTimeout(unit); > } > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)