[jira] [Comment Edited] (MAPREDUCE-7169) Speculative attempts should not run on the same node

Bilwa S T (JIRA) Sun, 17 Mar 2019 21:55:07 -0700


    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16794711#comment-16794711
 ]


Bilwa S T edited comment on MAPREDUCE-7169 at 3/18/19 4:54 AM:
---------------------------------------------------------------

Hi [~uranus]

Same as what [~bibinchundatt] had suggested. We can skip nodes where previous 
attempt was launched. 
{quote} * In TaskImpl#addAndScheduleAttempt  whenever Avataar is SPECULATIVE  
update previous attempts nodes and racks                                 
 *  In TaskAttemptImpl#RequestContainerTransition when Avataar is SPECULATIVE , 
we can skip previous attempts nodes and racks . Also we need to keep record of 
blacklisted  nodes                                                   
 *  During allocation for mapper RMContainerAllocator#assignMapsWithLocality 
,we have three types of  resource requests                                      
                     1. Hosts - already we have updated datahosts for the task 
attempt                 2. Racks - this is also taken care in taskattemptimpl   
                                     3. Any - In this case we need blacklisted 
node record which we had upadated so that we can check if node is blacklisted 
for the task. If it is blacklisted then we skip allocating and it would retry 
for other container {quote}


was (Author: bilwast):
Hi [~uranus]

Same as what [~bibinchundatt] had suggested. We can skip nodes where previous 
attempt was launched. 
{quote} * In TaskImpl#addAndScheduleAttempt  whenever Avataar is SPECULATIVE  
update previous attempts nodes and racks                                 
 *  In TaskAttemptImpl#RequestContainerTransition when Avataar is SPECULATIVE , 
we can skip previous attempts nodes and racks . Also we need to keep record of 
blacklisted  nodes                                                   
 *  During allocation for mapper RMContainerAllocator#assignMapsWithLocality 
,we have three types of  resource requests                                      
                                                    1. Hosts - already we have 
updated datahosts for the task attempt                 2. Racks - this is also 
taken care in taskattemptimpl                                       3. Any - In 
this case we need blacklisted node record which we had upadated so that we can 
check if node is blacklisted for the task. If it is blacklisted then we skip 
allocating and it would retry for other container {quote}

> Speculative attempts should not run on the same node
> ----------------------------------------------------
>
>                 Key: MAPREDUCE-7169
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7169
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: yarn
>    Affects Versions: 2.7.2
>            Reporter: Lee chen
>            Assignee: Bilwa S T
>            Priority: Major
>         Attachments: image-2018-12-03-09-54-07-859.png
>
>
>           I found in all versions of yarn, Speculative Execution may set the 
> speculative task to the node of  original task.What i have read is only it 
> will try to have one more task attempt. haven't seen any place mentioning not 
> on same node.It is unreasonable.If the node have some problems lead to tasks 
> execution will be very slow. and then placement the speculative  task to same 
> node cannot help the  problematic task.
>          In our cluster （version 2.7.2，2700 nodes），this phenomenon appear 
> almost everyday.
>  !image-2018-12-03-09-54-07-859.png! 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (MAPREDUCE-7169) Speculative attempts should not run on the same node

Reply via email to