[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14208162#comment-14208162
 ] 

Jason Lowe commented on MAPREDUCE-6156:
---------------------------------------

Thanks for the patch, Junping!  Sorry for missing this in the review of 
MAPREDUCE-5891.

Rather than throwing the original IOException when we stop retrying, it might 
be more informative to wrap it with a message stating how many attempts and how 
long we waited total to try to connect.  Otherwise we could throw a socket 
timeout exception indicated we only waited 1ms depending upon how the attempts 
lined up with the retries.

Speaking of trying up to the retry timeout, we're computing a potentially lower 
connection timeout value based on the amount of time left in total retry time 
but then we could end up sleeping most of that time away just below.  Arguably 
we should be computing the socket timeout time just before we try to connect 
rather than just after we received the error.

I don't think we should always ignore InterruptedException.  That's probably 
what we'll get when the Fetcher threads are trying to be shut down, so I think 
we should return if {{stopped==true}} when that's received.

> Fetcher - connect() doesn't handle connection refused correctly 
> ----------------------------------------------------------------
>
>                 Key: MAPREDUCE-6156
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6156
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: sidharta seethana
>            Assignee: Junping Du
>            Priority: Blocker
>         Attachments: MAPREDUCE-6156.patch
>
>
> The connect() function in the fetcher assumes that whenever an IOException is 
> thrown, the amount of time passed equals "connectionTimeout" ( see code 
> snippet below ). This is incorrect. For example, in case the NM is down, an 
> ConnectException is thrown immediately - and the catch block assumes a minute 
> has passed when it is not the case.
> {code}
>   if (connectionTimeout < 0) {
>       throw new IOException("Invalid timeout "
>                             + "[timeout = " + connectionTimeout + " ms]");
>     } else if (connectionTimeout > 0) {
>       unit = Math.min(UNIT_CONNECT_TIMEOUT, connectionTimeout);
>     }
>     // set the connect timeout to the unit-connect-timeout
>     connection.setConnectTimeout(unit);
>     while (true) {
>       try {
>         connection.connect();
>         break;
>       } catch (IOException ioe) {
>         // update the total remaining connect-timeout
>         connectionTimeout -= unit;
>         // throw an exception if we have waited for timeout amount of time
>         // note that the updated value if timeout is used here
>         if (connectionTimeout == 0) {
>           throw ioe;
>         }
>         // reset the connect timeout for the last try
>         if (connectionTimeout < unit) {
>           unit = connectionTimeout;
>           // reset the connect time out for the final connect
>           connection.setConnectTimeout(unit);
>         }
>       }
>     }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to