[jira] Commented: (HADOOP-3130) Shuffling takes too long to get the last map output.

Amar Kamat (JIRA) Mon, 14 Apr 2008 21:50:12 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-3130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12588898#action_12588898
 ]


Amar Kamat commented on HADOOP-3130:
------------------------------------

bq. A minor point. Since UNIT_CONNECT_TIMEOUT is private final, the following 
code segment seems redudant: ...
The reason for doing the check is that  _unit-connect-timeout_ = 0 and 
_total-timeout_ > 0 will result into infinite loop. Since users can change 
unit-connect-timeout (and recompile), I think its safe to guard against such 
cases and fail early.
bq. Also, you need to test whether the ioe is due to connection timeout. ...
What should be the right behaviour in case of non connection-timeout 
exceptions? Surely retrying (w/o any penalty) is not a good option since that 
will lead to longer waits (may be infinite). 
- One way would be to decrement the total-time left (so that the loop 
termination is guaranteed) and LOG the type of exception encountered. That is 
treat it like a connection-timeout exception.
- A bit more complex way would be to discriminate the penalty incurred in each 
case. For example, decrement _unit-connect-timeout/2_ in case of non 
connect-timeout exceptions and decrement _unit-connect-timeout_ otherwise.
- Another more complex way would be to tolerate some failures (w/o penalty) for 
the non-connect-timeout exceptions. 
----
For now I think its okay to keep it simple.  Note that the reducer will not get 
killed if one meta-connect attempt fails, it requires a bunch of them.

> Shuffling takes too long to get the last map output.
> ----------------------------------------------------
>
>                 Key: HADOOP-3130
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3130
>             Project: Hadoop Core
>          Issue Type: Bug
>            Reporter: Runping Qi
>            Assignee: Amar Kamat
>         Attachments: HADOOP-3130-v2.patch, HADOOP-3130-v2.patch, 
> HADOOP-3130-v3.1.patch, HADOOP-3130-v3.patch, HADOOP-3130.patch, shuffling.log
>
>
> I noticed that towards the end of shufflling, the map output fetcher of the 
> reducer backs off too aggressively.
> I attach a fraction of one reduce log of my job.
> Noticed that the last map output was not fetched in 2 minutes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3130) Shuffling takes too long to get the last map output.

Reply via email to