[ https://issues.apache.org/jira/browse/TEZ-4172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
László Bodor updated TEZ-4172: ------------------------------ Description: Currently, TaskImpl doesn't consider failing a task if there are too many overall attempts. In case of LLAP, the number of preempted task attempts -> overall task attempts [can grow in a linkedhashmap|https://github.com/apache/tez/blob/master/tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/TaskImpl.java#L127]. In an edge case, where an upstream application (Hive LLAP) cannot cope with a problematic query, this can also lead to OOM in the AM, due the very high number of TaskAttemptImpl objects. It would be beneficial to have the chance to limit the overall number of task attempts, regardless of they have been failed or killed. > Let tasks be killed after too many overall attempts > --------------------------------------------------- > > Key: TEZ-4172 > URL: https://issues.apache.org/jira/browse/TEZ-4172 > Project: Apache Tez > Issue Type: Improvement > Reporter: László Bodor > Assignee: László Bodor > Priority: Major > Attachments: TEZ-4172.01.patch > > > Currently, TaskImpl doesn't consider failing a task if there are too many > overall attempts. In case of LLAP, the number of preempted task attempts -> > overall task attempts [can grow in a > linkedhashmap|https://github.com/apache/tez/blob/master/tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/TaskImpl.java#L127]. > In an edge case, where an upstream application (Hive LLAP) cannot cope with a > problematic query, this can also lead to OOM in the AM, due the very high > number of TaskAttemptImpl objects. > It would be beneficial to have the chance to limit the overall number of task > attempts, regardless of they have been failed or killed. -- This message was sent by Atlassian Jira (v8.3.4#803005)