jteagles commented on a change in pull request #37: TEZ-4042: Speculative 
attempts should avoid running on the same node
URL: https://github.com/apache/tez/pull/37#discussion_r259218826
 
 

 ##########
 File path: 
tez-dag/src/main/java/org/apache/tez/dag/app/rm/DagAwareYarnTaskScheduler.java
 ##########
 @@ -567,8 +568,9 @@ private void informAppAboutAssignments(List<Assignment> 
assignments) {
    * @param container the container assigned to the task
    */
   private void informAppAboutAssignment(TaskRequest request, Container 
container) {
-    if (blacklistedNodes.contains(container.getNodeId())) {
-      Object task = request.getTask();
+    Object task = request.getTask();
+    if (blacklistedNodes.contains(container.getNodeId())
+        || task instanceof TaskAttempt && ((TaskAttempt) 
task).getUnhealthyNodesHistory().contains(container.getNodeId())) {
 
 Review comment:
   I think informAppAboutAssignment does avoid scheduling the speculative 
attempt on the nodes already running tasks, so that is good. However, it has 
the consequence of deallocating free containers on nodes running attempts that 
have been speculated. If we knew exactly that the node was slow, we could treat 
the node as unhealthy. But choosing a tasks for speculation is just a 
reasonable guess with many false positives.
   
   Instead would it work if the check was made in tryAssignReuseContainer, 
tryAssignNewContainer, tryAssignTaskToIdleContainer? With the check made early, 
we can prevent deallocating containers.
   
   In the future, I can see passing the node to avoid along with the AMRMClient 
when requesting new containers to prevent requesting a node to avoid for a 
speculative task attempt. It may be possible to do that now, but I have not 
checked.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to