[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17081201#comment-17081201
 ] 

Bilwa S T edited comment on MAPREDUCE-7169 at 4/11/20, 7:32 AM:
----------------------------------------------------------------

Hi [~ahussein] 

Sorry for late reply. 
{quote}I see that you add the node hosting the original task to the blacklist 
of the speculative task. Wouldn't it be easier just to change the order of the 
dataLocalHosts so that the node will be picked last in the loop? In that case, 
the speculative task will run on the same node only if all other nodes cannot 
be assigned to the speculative task.
{quote}
With change like this, there are chances that task attempt will get launched on 
same node where it was launched which wouldn't solve the problem. If we 
blacklist node then in next iteration of containers assigned it will be 
launched on other node.
{quote}My intuition is that changing the policy to pick the node for the 
speculative task will inherently change the efficiency of the speculation.
For example, picking a different node may increase the startup time of the 
speculative task. This implies change of the speculation efficiency compared to 
the legacy behavior. Thus, I suggest to give the option for the user to 
enable/disable the new policy in case she prefers to evaluate the new behavior 
and revert back to the legacy one if necessary.
{quote}

I agree with this. i will change code accordingly. Currently working on it


was (Author: bilwast):
Hi [~ahussein] 

Sorry for late reply. 
{quote}I see that you add the node hosting the original task to the blacklist 
of the speculative task. Wouldn't it be easier just to change the order of the 
dataLocalHosts so that the node will be picked last in the loop? In that case, 
the speculative task will run on the same node only if all other nodes cannot 
be assigned to the speculative task.
{quote}

With change like this, there are chances that task attempt will get launched on 
same node where it was launched which wouldn't solve the problem. If we 
blacklist node then in next iteration of containers assigned it will be 
launched on other node.

> Speculative attempts should not run on the same node
> ----------------------------------------------------
>
>                 Key: MAPREDUCE-7169
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7169
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: yarn
>    Affects Versions: 2.7.2
>            Reporter: Lee chen
>            Assignee: Bilwa S T
>            Priority: Major
>         Attachments: MAPREDUCE-7169-001.patch, MAPREDUCE-7169-002.patch, 
> MAPREDUCE-7169-003.patch, image-2018-12-03-09-54-07-859.png
>
>
>           I found in all versions of yarn, Speculative Execution may set the 
> speculative task to the node of  original task.What i have read is only it 
> will try to have one more task attempt. haven't seen any place mentioning not 
> on same node.It is unreasonable.If the node have some problems lead to tasks 
> execution will be very slow. and then placement the speculative  task to same 
> node cannot help the  problematic task.
>          In our cluster (version 2.7.2,2700 nodes),this phenomenon appear 
> almost everyday.
>  !image-2018-12-03-09-54-07-859.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

Reply via email to