[jira] [Commented] (TEZ-2186) OOM with a simple scatter gather job with re-use

Siddharth Seth (JIRA) Tue, 10 Mar 2015 12:57:13 -0700

    [ 
https://issues.apache.org/jira/browse/TEZ-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14355575#comment-14355575
 ]


Siddharth Seth commented on TEZ-2186:
-------------------------------------

Thanks for looking at this Rajesh. Nice find on the Fetchers not starting 
before they're asked to shutdown, which means they end up leaking.
Do you think we should be moving the UnorderedFetchers to a Threadpool rather 
than trying to manage the threads explicitly ? The next step after that would 
be to allow them to run a shared thread pool - which is a more involved change 
since the current Threads end up blocking on merges, which would block threads 
on a shared pool unnecessarily.

> OOM with a simple scatter gather job with re-use
> ------------------------------------------------
>
>                 Key: TEZ-2186
>                 URL: https://issues.apache.org/jira/browse/TEZ-2186
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Siddharth Seth
>         Attachments: TEZ-2186.1.patch, noopexample.txt
>
>
> With a no-op scatter gather job, 20K x 2K, on a 20 node cluster with 20 2GB 
> containers per node - reducers end up failing with OOM errors. Haven't been 
> able to generate a heap dump yet. Will add details as they're found. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-2186) OOM with a simple scatter gather job with re-use

Reply via email to