Rajesh Balamohan created TEZ-2738:
-------------------------------------

             Summary: ContainerLauncher tries to connect to unhealthy node for 
large number of times
                 Key: TEZ-2738
                 URL: https://issues.apache.org/jira/browse/TEZ-2738
             Project: Apache Tez
          Issue Type: Bug
            Reporter: Rajesh Balamohan


Env: Ran a job with tez (built from master branch on aug-24). 

One of the nodes went down in the middle of the run. And DAGAppMaster had a 
container launch in that node. After sometime, this node was declared as 
unhealthy.  Even though the job lasted only for 7 minutes, DAGAppMaster was 
unresponsive after dag cleanup for > 1.5 hours.  It kept on trying to connect 
to the unhealthy node. I will attach the logs in this JIRA.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to