[jira] [Updated] (TEZ-2186) OOM with a simple scatter gather job with re-use

Rajesh Balamohan (JIRA) Thu, 12 Mar 2015 03:51:08 -0700

     [ 
https://issues.apache.org/jira/browse/TEZ-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Rajesh Balamohan updated TEZ-2186:
----------------------------------
    Attachment: TEZ-2186.2.patch

Yes [~sseth].  It would be good to move to threadpool model as opposed to 
managing them explicitly.  The earlier patch had a minor bug due to which 42 
errors happened, even though it was much less compared to original run.  I have 
fixed the issue in the latest patch and ran it with 50K x 50K. It ran 
successfully without issues. 

> OOM with a simple scatter gather job with re-use
> ------------------------------------------------
>
>                 Key: TEZ-2186
>                 URL: https://issues.apache.org/jira/browse/TEZ-2186
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Siddharth Seth
>         Attachments: TEZ-2186.1.patch, TEZ-2186.2.patch, noopexample.txt
>
>
> With a no-op scatter gather job, 20K x 2K, on a 20 node cluster with 20 2GB 
> containers per node - reducers end up failing with OOM errors. Haven't been 
> able to generate a heap dump yet. Will add details as they're found. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TEZ-2186) OOM with a simple scatter gather job with re-use

Reply via email to