[jira] [Updated] (HDFS-13393) Improve OOM logging

Gabor Bota (JIRA) Mon, 10 Dec 2018 02:32:14 -0800


     [ 
https://issues.apache.org/jira/browse/HDFS-13393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Gabor Bota updated HDFS-13393:
------------------------------
    Attachment:     (was: HADOOP-15988.001.patch)

> Improve OOM logging
> -------------------
>
>                 Key: HDFS-13393
>                 URL: https://issues.apache.org/jira/browse/HDFS-13393
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: balancer &amp; mover, datanode
>            Reporter: Wei-Chiu Chuang
>            Assignee: Gabor Bota
>            Priority: Major
>
> It is not uncommon to find "java.lang.OutOfMemoryError: unable to create new 
> native thread" errors in a HDFS cluster. Most often this happens when 
> DataNode creating DataXceiver threads, or when balancer creates threads for 
> moving blocks around.
> In most of cases, the "OOM" is a symptom of number of threads reaching system 
> limit, rather than actually running out of memory, and the current logging of 
> this message is usually misleading (suggesting this is due to insufficient 
> memory)
> How about capturing the OOM, and if it is due to "unable to create new native 
> thread", print some more helpful message like "bump your ulimit" or "take a 
> jstack of the process"?
> Even better, surface this error to make it more visible. It usually takes a 
> while for an in-depth investigation after users notice some job fails, by the 
> time the evidences may already been gone (like jstack output).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (HDFS-13393) Improve OOM logging

Reply via email to