[
https://issues.apache.org/jira/browse/TEZ-1962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Siddharth Seth updated TEZ-1962:
--------------------------------
Attachment: TEZ-1962.1.txt
Patch to fix this.
The main reason here is a NPE in a log line in case of an Interrupt. The
exception causes TezChild.run to fall off without shutting down the executor
and TaskReporter threads.
The patch fixes the NPE, adds some checks to ensure shutdown is called, and
changes LocalContainerLauncher to invoke a TezChild shutdown in case of an
error from TezChild.
I'm going to open a couple of follow up jiras to change the way tasks are
cancelled.
Tested locally, and there's no hung threads after this.
[~hitesh] - please review.
> Running out of threads in tez local mode
> ----------------------------------------
>
> Key: TEZ-1962
> URL: https://issues.apache.org/jira/browse/TEZ-1962
> Project: Apache Tez
> Issue Type: Sub-task
> Reporter: Gunther Hagleitner
> Assignee: Siddharth Seth
> Priority: Critical
> Attachments: TEZ-1962.1.txt, stack5.txt
>
>
> I've been trying to port the hive ut to tez local mode. However, local mode
> seems to leak threads which causes tests to crash after a while (oom). See
> attached stack trace - there are a lot of "TezChild" threads still hanging
> around.
> ([~sseth] as discussed offline)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)