[ https://issues.apache.org/jira/browse/TEZ-3822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yingda Chen reassigned TEZ-3822: -------------------------------- Assignee: Ying Han > Default threshold for blacklisting a node is too high > ----------------------------------------------------- > > Key: TEZ-3822 > URL: https://issues.apache.org/jira/browse/TEZ-3822 > Project: Apache Tez > Issue Type: Bug > Reporter: Zhiyuan Yang > Assignee: Ying Han > Priority: Major > > By default, a task will be failed if 4 task attempts fail, which consequently > fail the vertex and dag. By default, a node will be blacklisted if 10 task > attempts fail on it. This number is higher than 4, so a single faulty node > server can fail 4 task attempts by shuffle error, which finally fail the job, > before this node is blacklisted. Even we can reschedule a task after it's > blamed for input read error, we cannot avoid multiple tasks go to the same > bad node and continue to cause shuffle error. -- This message was sent by Atlassian JIRA (v7.6.3#76005)