[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13995149#comment-13995149
 ] 

Jason Lowe commented on MAPREDUCE-5465:
---------------------------------------

bq. 5. (t3-t1) can also impact job latency. Notifying task/job earlier won't 
help to improve (t3-t1).

It can if we're assuming sufficient capacity in the cluster.  t3 is dependent 
upon when the AM asks for the containers, and the sooner the AM knows a task 
completed the sooner it can ask for new containers (e.g.: map tasks completing 
leading to launching reduce tasks).  The other scenario where job completion 
time is reduced is when reduce tasks which are already running are waiting upon 
the final map task.  In that case we should be notifying the reduce tasks of 
the map task completion event as soon as the completion message arrives across 
the umbilical from the map task and not wait until we receive the container 
completion from the RM.  That delay will directly lead to longer job times.

Regarding the out-of-band heartbeat, agreed that we should consider sending OOB 
heartbeats on container completion rather than kill.  Filed YARN-2046 to track 
that issue.



> Container killed before hprof dumps profile.out
> -----------------------------------------------
>
>                 Key: MAPREDUCE-5465
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5465
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: mr-am, mrv2
>    Affects Versions: trunk, 2.0.3-alpha
>            Reporter: Radim Kolar
>            Assignee: Ming Ma
>         Attachments: MAPREDUCE-5465-2.patch, MAPREDUCE-5465-3.patch, 
> MAPREDUCE-5465-4.patch, MAPREDUCE-5465-5.patch, MAPREDUCE-5465-6.patch, 
> MAPREDUCE-5465.patch
>
>
> If there is profiling enabled for mapper or reducer then hprof dumps 
> profile.out at process exit. It is dumped after task signaled to AM that work 
> is finished.
> AM kills container with finished work without waiting for hprof to finish 
> dumps. If hprof is dumping larger outputs (such as with depth=4 while depth=3 
> works) , it could not finish dump in time before being killed making entire 
> dump unusable because cpu and heap stats are missing.
> There needs to be better delay before container is killed if profiling is 
> enabled.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to