[ https://issues.apache.org/jira/browse/MAPREDUCE-7169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17101802#comment-17101802 ]
Ahmed Hussein commented on MAPREDUCE-7169: ------------------------------------------ [~BilwaST], the speculation, taskAttempts, and allocations code set is not a straightforward module to tackle. You did a great good job! I have the following points: *Corner Case scenario:* * Assuming that a new speculative attempt is created. Following the implementation, the new attempt X will have blacklisted nodes and skipped racks relevant to the original taskAttempt Y * Assuming taskAttempt Y is killed before attempt X gets assigned. * The RMContainerAllocator would still assign a host to attemptX based on the dated blacklists. Is this the expected behavior? or it is supposed to clear attemptX' blacklisted nodes? *TaskAttemptBlacklistManager* * Should that object be synchronized? I believe there are more than one thread reading/writing to that object. Perhaps changing {{taskAttemptToEventMapping}} to {{concurrentHashMap}} would be sufficient. What do you think? * In {{taskAttemptToEventMapping}}, the data is only removed when the taskAttempt is assigned. If taskAttempt is killed before being assigned, {{taskAttemptToEventMapping}} would still have the taskAttempt. *{{TaskAttemptBlacklistManager}}* * Should that object be synchronized? I believe there are more than one thread reading/writing to that object. Perhaps changing {{taskAttemptToEventMapping}} to concurrentHashMap would be sufficient. What do you think? * In taskAttemptToEventMapping, the data is only removed when the taskAttempt is assigned. If taskAttempt is killed before being assigned, taskAttemptToEventMapping would still have the taskAttempt. *{{TaskAttemptImpl}}* * Racks are going to be black listed too. Not just nodes. I believe that the javadoc and description in default.xml should emphasize that enabling the flag also avoids the local rack unless no other rack is available for scheduling. *{{TaskImpl}}* * why do we need {{mapTaskAttemptToAvataar}} when each taskAttempt has a field called {{avataar}} ? *{{ContainerRequestEvent}}* - That's a design issue. One would expect that RequestEvent's lifetime should not survive {{handle()}} call. Therefore, the metadata should be consumed by the handlers. In the patch, {{ContainerRequestEvent.blacklistedNodes}} could be a field in taskAttempt. Then you won't need {{TaskAttemptBlacklistManager}} class. > Speculative attempts should not run on the same node > ---------------------------------------------------- > > Key: MAPREDUCE-7169 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7169 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: yarn > Affects Versions: 2.7.2 > Reporter: Lee chen > Assignee: Bilwa S T > Priority: Major > Attachments: MAPREDUCE-7169-001.patch, MAPREDUCE-7169-002.patch, > MAPREDUCE-7169-003.patch, MAPREDUCE-7169.004.patch, MAPREDUCE-7169.005.patch, > image-2018-12-03-09-54-07-859.png > > > I found in all versions of yarn, Speculative Execution may set the > speculative task to the node of original task.What i have read is only it > will try to have one more task attempt. haven't seen any place mentioning not > on same node.It is unreasonable.If the node have some problems lead to tasks > execution will be very slow. and then placement the speculative task to same > node cannot help the problematic task. > In our cluster (version 2.7.2,2700 nodes),this phenomenon appear > almost everyday. > !image-2018-12-03-09-54-07-859.png! -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org