[
https://issues.apache.org/jira/browse/HADOOP-3130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12588898#action_12588898
]
Amar Kamat commented on HADOOP-3130:
------------------------------------
bq. A minor point. Since UNIT_CONNECT_TIMEOUT is private final, the following
code segment seems redudant: ...
The reason for doing the check is that _unit-connect-timeout_ = 0 and
_total-timeout_ > 0 will result into infinite loop. Since users can change
unit-connect-timeout (and recompile), I think its safe to guard against such
cases and fail early.
bq. Also, you need to test whether the ioe is due to connection timeout. ...
What should be the right behaviour in case of non connection-timeout
exceptions? Surely retrying (w/o any penalty) is not a good option since that
will lead to longer waits (may be infinite).
- One way would be to decrement the total-time left (so that the loop
termination is guaranteed) and LOG the type of exception encountered. That is
treat it like a connection-timeout exception.
- A bit more complex way would be to discriminate the penalty incurred in each
case. For example, decrement _unit-connect-timeout/2_ in case of non
connect-timeout exceptions and decrement _unit-connect-timeout_ otherwise.
- Another more complex way would be to tolerate some failures (w/o penalty) for
the non-connect-timeout exceptions.
----
For now I think its okay to keep it simple. Note that the reducer will not get
killed if one meta-connect attempt fails, it requires a bunch of them.
> Shuffling takes too long to get the last map output.
> ----------------------------------------------------
>
> Key: HADOOP-3130
> URL: https://issues.apache.org/jira/browse/HADOOP-3130
> Project: Hadoop Core
> Issue Type: Bug
> Reporter: Runping Qi
> Assignee: Amar Kamat
> Attachments: HADOOP-3130-v2.patch, HADOOP-3130-v2.patch,
> HADOOP-3130-v3.1.patch, HADOOP-3130-v3.patch, HADOOP-3130.patch, shuffling.log
>
>
> I noticed that towards the end of shufflling, the map output fetcher of the
> reducer backs off too aggressively.
> I attach a fraction of one reduce log of my job.
> Noticed that the last map output was not fetched in 2 minutes.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.