[ 
https://issues.apache.org/jira/browse/TEZ-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16948475#comment-16948475
 ] 

Rajesh Balamohan commented on TEZ-4087:
---------------------------------------

I will try to get more logs and share the detail. Basically, if the interrupt() 
should have just set the status bit and not thrown the exception if the thread 
wasn't blocked on something. In this case, when the referee thread got 
interrupted, it shouldn't be on the blocking code path.

> Shuffle: Fix shuffle cleanup to prevent thread leaks
> ----------------------------------------------------
>
>                 Key: TEZ-4087
>                 URL: https://issues.apache.org/jira/browse/TEZ-4087
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Rajesh Balamohan
>            Priority: Major
>
> In certain cases, Shuffle's cleanupIgnoreErrors() is not called. This leaves 
> 4 threads (inmem, diskmerger, Referee, ShuffleAndMergeRunner) run forever.
> When these are run in long running processes (e.g LLAP in Hive), they reach 
> the thread limits over time.
> Note: Root cause why cleanupIgnoreErrors() is not invoked is not yet known. I 
> will share the details when i get more details on this. Creating this ticket 
> to add additional safety knobs to ensure that thread leaks do not happen.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to