[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14938321#comment-14938321
 ] 

Anubhav Dhoot commented on MAPREDUCE-6302:
------------------------------------------

The patch looks mostly good

why does availableResourceForMap not consider assignedRequests.maps after the 
patch?

The earlier comments had some more description that would be useful to 
preserve. Maybe as a heading for both set of values to describe when does 
preemption kick in.  For eg the earlier description "The threshold in terms of 
seconds after which an unsatisfied mapper request triggers reducer preemption 
to free space."

Would UNCONDITIONAL be better than FORCE, because its not like the other one is 
an optional preemption when it kicks in?
consider 
reverting duration -> allocationDelayThresholdMs
forcePreemptThreshold -> forcePreemptThresholdSec
reducerPreemptionHoldMs -> reducerNoHeadroomPreemptionMs

resourceLimit in allocation is a weird name for the headroom in the Allocation. 
Consider another jira for fixing that.


> Incorrect headroom can lead to a deadlock between map and reduce allocations 
> -----------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-6302
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6302
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 2.6.0
>            Reporter: mai shurong
>            Assignee: Karthik Kambatla
>            Priority: Critical
>         Attachments: AM_log_head100000.txt.gz, AM_log_tail100000.txt.gz, 
> log.txt, mr-6302-1.patch, mr-6302-2.patch, mr-6302-prelim.patch, 
> queue_with_max163cores.png, queue_with_max263cores.png, 
> queue_with_max333cores.png
>
>
> I submit a  big job, which has 500 maps and 350 reduce, to a 
> queue(fairscheduler) with 300 max cores. When the big mapreduce job is 
> running 100% maps, the 300 reduces have occupied 300 max cores in the queue. 
> And then, a map fails and retry, waiting for a core, while the 300 reduces 
> are waiting for failed map to finish. So a deadlock occur. As a result, the 
> job is blocked, and the later job in the queue cannot run because no 
> available cores in the queue.
> I think there is the similar issue for memory of a queue .



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to