[ https://issues.apache.org/jira/browse/TEZ-4172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17121332#comment-17121332 ]
Ashutosh Chauhan commented on TEZ-4172: --------------------------------------- [~rajesh.balamohan] can you please help review this? > Let tasks be killed after too many overall attempts > --------------------------------------------------- > > Key: TEZ-4172 > URL: https://issues.apache.org/jira/browse/TEZ-4172 > Project: Apache Tez > Issue Type: Improvement > Reporter: László Bodor > Assignee: László Bodor > Priority: Major > Attachments: TEZ-4172.01.patch > > > Currently, TaskImpl doesn't consider failing a task if there are too many > overall attempts. In case of LLAP, the number of preempted task attempts -> > overall task attempts [can grow in a > linkedhashmap|https://github.com/apache/tez/blob/master/tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/TaskImpl.java#L127]. > In an edge case, where an upstream application (Hive LLAP) cannot cope with a > problematic query, this can also lead to OOM in the AM, due the very high > number of TaskAttemptImpl objects. > It would be beneficial to have the chance to limit the overall number of task > attempts, regardless of they have been failed or killed. -- This message was sent by Atlassian Jira (v8.3.4#803005)