[ https://issues.apache.org/jira/browse/MAPREDUCE-7169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17081201#comment-17081201 ]
Bilwa S T edited comment on MAPREDUCE-7169 at 4/11/20, 7:32 AM: ---------------------------------------------------------------- Hi [~ahussein] Sorry for late reply. {quote}I see that you add the node hosting the original task to the blacklist of the speculative task. Wouldn't it be easier just to change the order of the dataLocalHosts so that the node will be picked last in the loop? In that case, the speculative task will run on the same node only if all other nodes cannot be assigned to the speculative task. {quote} With change like this, there are chances that task attempt will get launched on same node where it was launched which wouldn't solve the problem. If we blacklist node then in next iteration of containers assigned it will be launched on other node. {quote}My intuition is that changing the policy to pick the node for the speculative task will inherently change the efficiency of the speculation. For example, picking a different node may increase the startup time of the speculative task. This implies change of the speculation efficiency compared to the legacy behavior. Thus, I suggest to give the option for the user to enable/disable the new policy in case she prefers to evaluate the new behavior and revert back to the legacy one if necessary. {quote} I agree with this. i will change code accordingly. Currently working on it was (Author: bilwast): Hi [~ahussein] Sorry for late reply. {quote}I see that you add the node hosting the original task to the blacklist of the speculative task. Wouldn't it be easier just to change the order of the dataLocalHosts so that the node will be picked last in the loop? In that case, the speculative task will run on the same node only if all other nodes cannot be assigned to the speculative task. {quote} With change like this, there are chances that task attempt will get launched on same node where it was launched which wouldn't solve the problem. If we blacklist node then in next iteration of containers assigned it will be launched on other node. > Speculative attempts should not run on the same node > ---------------------------------------------------- > > Key: MAPREDUCE-7169 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7169 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: yarn > Affects Versions: 2.7.2 > Reporter: Lee chen > Assignee: Bilwa S T > Priority: Major > Attachments: MAPREDUCE-7169-001.patch, MAPREDUCE-7169-002.patch, > MAPREDUCE-7169-003.patch, image-2018-12-03-09-54-07-859.png > > > I found in all versions of yarn, Speculative Execution may set the > speculative task to the node of original task.What i have read is only it > will try to have one more task attempt. haven't seen any place mentioning not > on same node.It is unreasonable.If the node have some problems lead to tasks > execution will be very slow. and then placement the speculative task to same > node cannot help the problematic task. > In our cluster (version 2.7.2,2700 nodes),this phenomenon appear > almost everyday. > !image-2018-12-03-09-54-07-859.png! -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org