[
https://issues.apache.org/jira/browse/HADOOP-3956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12622778#action_12622778
]
Amir Youssefi commented on HADOOP-3956:
---------------------------------------
Sample issues we detect from task/job logs:
- Shuffle
- Map Spill
- Lagging single reducer (un-even distribution from time or row count point
of view)
and more...
Runping brought up a good point on availability of data while process is
running. In some installations, logs are gathered by HOD after job is finished.
User needs to wait for a job to finish to see all logs. We can change logging
process to some extent and improve availability of items in progressing Live
Log. Task counters are available when each task is finished.
BTW, diagram in item 1 above refers to a progress diagram developed by Owen
O'Malley.
> map-reduce doctor (Mr Doctor)
> -----------------------------
>
> Key: HADOOP-3956
> URL: https://issues.apache.org/jira/browse/HADOOP-3956
> Project: Hadoop Core
> Issue Type: New Feature
> Reporter: Amir Youssefi
>
> Problem Description:
> Users typically submit jobs with sub-optimal parameters resulting in
> under-utilization, black-listed task-trackers, time-outs, re-tries etc.
> Issue can be mitigated by submitting job with custom Hadoop parameters.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.