[
https://issues.apache.org/jira/browse/HDFS-16726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
yuyanlei updated HDFS-16726:
----------------------------
Description:
In the cluster, the memory usage of Namenode exceeds the XMX setting (XMX
=280GB). The actual memory usage of Namenode is 479GB
Output via pamp:
Address Perm Offset Device Inode Size Rss Pss
Referenced Anonymous Swap Locked Mapping
2b42f0000000 rw-p 00000000 00:00 0 294174720 293756960 293756960
293756960 293756960 0 0
01e21000 rw-p 00000000 00:00 0 195245456 195240848 195240848
195240848 195240848 0 0 [heap]
2b897c000000 rw-p 00000000 00:00 0 9246724 9246724 9246724
9246724 9246724 0 0
2b8bb0905000 rw-p 00000000 00:00 0 1781124 1754572 1754572
1754572 1754572 0 0
2b8936000000 rw-p 00000000 00:00 0 1146880 1002084 1002084
1002084 1002084 0 0
2b42db652000 rwxp 00000000 00:00 0 57792 55252 55252
55252 55252 0 0
2b42ec12a000 rw-p 00000000 00:00 0 25696 24700 24700
24700 24700 0 0
2b42ef25b000 rw-p 00000000 00:00 0 9988 8972 8972
8972 8972 0 0
2b8c1d467000 rw-p 00000000 00:00 0 9216 8204 8204
8204 8204 0 0
2b8d6f8db000 rw-p 00000000 00:00 0 7160 6228 6228
6228 6228 0 0
The first line should configure the memory footprint for XMX, and [heap] is
unusually large, so a memory leak is suspected!
* [heap] is associated with malloc
After configuring JCMD in the test environment, we found that the malloc part
of Internal in JCMD increased significantly when the client was writing to a gz
file (XMX =40g in the test environment, and the Internal area was 900MB before
the client wrote) :
Total: reserved=47276MB, committed=47070MB
- Java Heap (reserved=40960MB, committed=40960MB)
(mmap: reserved=40960MB, committed=40960MB)
- Class (reserved=53MB, committed=52MB)
(classes #7423)
(malloc=1MB #17053)
(mmap: reserved=52MB, committed=52MB)
- Thread (reserved=2145MB, committed=2145MB)
(thread #2129)
(stack: reserved=2136MB, committed=2136MB)
(malloc=7MB #10673)
(arena=2MB #4256)
- Code (reserved=251MB, committed=45MB)
(malloc=7MB #10661)
(mmap: reserved=244MB, committed=38MB)
- GC (reserved=2307MB, committed=2307MB)
(malloc=755MB #525664)
(mmap: reserved=1552MB, committed=1552MB)
- Compiler (reserved=8MB, committed=8MB)
(malloc=8MB #8852)
- Internal (reserved=1524MB, committed=1524MB)
(malloc=1524MB #323482)
- Symbol (reserved=12MB, committed=12MB)
(malloc=10MB #91715)
(arena=2MB #1)
- Native Memory Tracking (reserved=16MB, committed=16MB)
(tracking overhead=15MB)
It is clear that the Internal malloc increases significantly when the client
writes, and does not decrease after the client stops writing
Through pref, I found some more instances when writing on the client side:
Children Self Comm Shared Ob Symbol
0.05% 0.00% java libzip.so [.] Java_java_util_zip_ZipFile_getEntry
0.02% 0.00% java libzip.so [.]
Java_java_util_zip_Inflater_inflateBytes
Therefore, it is suspected that the compressed write operation of the client
may have a memory leak problem
Use JCMD to locate the call link to Java_java_util_zip_Inflater_inflateBytes:
"ExtensionRefresher" #59 daemon prio=5 os_prio=0 tid=0x000000002419d000
nid=0x69df runnable [0x00002b319d7a0000]
java.lang.Thread.State: RUNNABLE
at java.util.zip.Inflater.inflateBytes(Native Method)
at java.util.zip.Inflater.inflate(Inflater.java:259)
- locked <0x00002b278f7b9da8> (a java.util.zip.ZStreamRef)
at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:152)
at java.io.FilterInputStream.read(FilterInputStream.java:133)
at
org.apache.xerces.impl.XMLEntityManager$RewindableInputStream.read(Unknown
Source)
at org.apache.xerces.impl.io.UTF8Reader.read(Unknown Source)
at org.apache.xerces.impl.XMLEntityScanner.load(Unknown Source)
at org.apache.xerces.impl.XMLEntityScanner.scanQName(Unknown Source)
at
org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanStartElement(Unknown Source)
at
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown
Source)
at
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown
Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)
at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:150)
at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2594)
at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2582)
at
org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2656)
at
org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2606)
at
org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2519)
- locked <0x00002b3114eb4a98> (a org.apache.hadoop.conf.Configuration)
at org.apache.hadoop.conf.Configuration.get(Configuration.java:1091)
at
org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:1145)
at
org.apache.hadoop.conf.Configuration.getBoolean(Configuration.java:1546)
at
org.apache.hadoop.util.WhiteListFileManager.refresh(WhiteListFileManager.java:176)
- locked <0x00002b2d6fe06a28> (a java.lang.Class for
org.apache.hadoop.util.WhiteListFileManager)
at
org.apache.hadoop.util.ExtensionManager$2.run(ExtensionManager.java:70)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
was:
In the cluster, the memory usage of Namenode exceeds the XMX setting (XMX
=280GB). The actual memory usage of Namenode is 479GB
Output via pamp:
Address Perm Offset Device Inode Size Rss Pss
Referenced Anonymous Swap Locked Mapping
2b42f0000000 rw-p 00000000 00:00 0 294174720 293756960 293756960
293756960 293756960 0 0
01e21000 rw-p 00000000 00:00 0 195245456 195240848 195240848
195240848 195240848 0 0 [heap]
2b897c000000 rw-p 00000000 00:00 0 9246724 9246724 9246724
9246724 9246724 0 0
2b8bb0905000 rw-p 00000000 00:00 0 1781124 1754572 1754572
1754572 1754572 0 0
2b8936000000 rw-p 00000000 00:00 0 1146880 1002084 1002084
1002084 1002084 0 0
2b42db652000 rwxp 00000000 00:00 0 57792 55252 55252
55252 55252 0 0
2b42ec12a000 rw-p 00000000 00:00 0 25696 24700 24700
24700 24700 0 0
2b42ef25b000 rw-p 00000000 00:00 0 9988 8972 8972
8972 8972 0 0
2b8c1d467000 rw-p 00000000 00:00 0 9216 8204 8204
8204 8204 0 0
2b8d6f8db000 rw-p 00000000 00:00 0 7160 6228 6228
6228 6228 0 0
The first line should configure the memory footprint for XMX, and [heap] is
unusually large, so a memory leak is suspected!
* [heap] is associated with malloc
After configuring JCMD in the test environment, we found that the malloc part
of Internal in JCMD increased significantly when the client was writing to a gz
file (XMX =40g in the test environment, and the Internal area was 900MB before
the client wrote) :
Total: reserved=47276MB, committed=47070MB
- Java Heap (reserved=40960MB, committed=40960MB)
(mmap: reserved=40960MB, committed=40960MB)
- Class (reserved=53MB, committed=52MB)
(classes #7423)
(malloc=1MB #17053)
(mmap: reserved=52MB, committed=52MB)
- Thread (reserved=2145MB, committed=2145MB)
(thread #2129)
(stack: reserved=2136MB, committed=2136MB)
(malloc=7MB #10673)
(arena=2MB #4256)
- Code (reserved=251MB, committed=45MB)
(malloc=7MB #10661)
(mmap: reserved=244MB, committed=38MB)
- GC (reserved=2307MB, committed=2307MB)
(malloc=755MB #525664)
(mmap: reserved=1552MB, committed=1552MB)
- Compiler (reserved=8MB, committed=8MB)
(malloc=8MB #8852)
- Internal (reserved=1524MB, committed=1524MB)
(malloc=1524MB #323482)
- Symbol (reserved=12MB, committed=12MB)
(malloc=10MB #91715)
(arena=2MB #1)
- Native Memory Tracking (reserved=16MB, committed=16MB)
(tracking overhead=15MB)
It is clear that the Internal malloc increases significantly when the client
writes, and does not decrease after the client stops writing
Through pref, I found some more instances when writing on the client side:
Children Self Comm Shared Ob Symbol
0.05% 0.00% java libzip.so [.] Java_java_util_zip_ZipFile_getEntry
0.02% 0.00% java libzip.so [.]
Java_java_util_zip_Inflater_inflateBytes
Therefore, it is suspected that the compressed write operation of the client
may have a memory leak problem
> There is a memory-related problem about HDFS namenode
> -----------------------------------------------------
>
> Key: HDFS-16726
> URL: https://issues.apache.org/jira/browse/HDFS-16726
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: hdfs, namenode
> Affects Versions: 2.7.2
> Reporter: yuyanlei
> Priority: Critical
> Attachments: 图片_lanxin_20220809153722.png
>
>
> In the cluster, the memory usage of Namenode exceeds the XMX setting (XMX
> =280GB). The actual memory usage of Namenode is 479GB
> Output via pamp:
> Address Perm Offset Device Inode Size Rss Pss
> Referenced Anonymous Swap Locked Mapping
> 2b42f0000000 rw-p 00000000 00:00 0 294174720 293756960 293756960
> 293756960 293756960 0 0
> 01e21000 rw-p 00000000 00:00 0 195245456 195240848 195240848
> 195240848 195240848 0 0 [heap]
> 2b897c000000 rw-p 00000000 00:00 0 9246724 9246724 9246724
> 9246724 9246724 0 0
> 2b8bb0905000 rw-p 00000000 00:00 0 1781124 1754572 1754572
> 1754572 1754572 0 0
> 2b8936000000 rw-p 00000000 00:00 0 1146880 1002084 1002084
> 1002084 1002084 0 0
> 2b42db652000 rwxp 00000000 00:00 0 57792 55252 55252
> 55252 55252 0 0
> 2b42ec12a000 rw-p 00000000 00:00 0 25696 24700 24700
> 24700 24700 0 0
> 2b42ef25b000 rw-p 00000000 00:00 0 9988 8972 8972
> 8972 8972 0 0
> 2b8c1d467000 rw-p 00000000 00:00 0 9216 8204 8204
> 8204 8204 0 0
> 2b8d6f8db000 rw-p 00000000 00:00 0 7160 6228 6228
> 6228 6228 0 0
> The first line should configure the memory footprint for XMX, and [heap] is
> unusually large, so a memory leak is suspected!
>
> * [heap] is associated with malloc
> After configuring JCMD in the test environment, we found that the malloc part
> of Internal in JCMD increased significantly when the client was writing to a
> gz file (XMX =40g in the test environment, and the Internal area was 900MB
> before the client wrote) :
> Total: reserved=47276MB, committed=47070MB
> - Java Heap (reserved=40960MB, committed=40960MB)
> (mmap: reserved=40960MB, committed=40960MB)
>
> - Class (reserved=53MB, committed=52MB)
> (classes #7423)
> (malloc=1MB #17053)
> (mmap: reserved=52MB, committed=52MB)
>
> - Thread (reserved=2145MB, committed=2145MB)
> (thread #2129)
> (stack: reserved=2136MB, committed=2136MB)
> (malloc=7MB #10673)
> (arena=2MB #4256)
>
> - Code (reserved=251MB, committed=45MB)
> (malloc=7MB #10661)
> (mmap: reserved=244MB, committed=38MB)
>
> - GC (reserved=2307MB, committed=2307MB)
> (malloc=755MB #525664)
> (mmap: reserved=1552MB, committed=1552MB)
>
> - Compiler (reserved=8MB, committed=8MB)
> (malloc=8MB #8852)
>
> - Internal (reserved=1524MB, committed=1524MB)
> (malloc=1524MB #323482)
>
> - Symbol (reserved=12MB, committed=12MB)
> (malloc=10MB #91715)
> (arena=2MB #1)
>
> - Native Memory Tracking (reserved=16MB, committed=16MB)
> (tracking overhead=15MB)
> It is clear that the Internal malloc increases significantly when the client
> writes, and does not decrease after the client stops writing
>
> Through pref, I found some more instances when writing on the client side:
> Children Self Comm Shared Ob Symbol
>
>
> 0.05% 0.00% java libzip.so [.] Java_java_util_zip_ZipFile_getEntry
> 0.02% 0.00% java libzip.so [.]
> Java_java_util_zip_Inflater_inflateBytes
> Therefore, it is suspected that the compressed write operation of the client
> may have a memory leak problem
>
> Use JCMD to locate the call link to Java_java_util_zip_Inflater_inflateBytes:
> "ExtensionRefresher" #59 daemon prio=5 os_prio=0 tid=0x000000002419d000
> nid=0x69df runnable [0x00002b319d7a0000]
> java.lang.Thread.State: RUNNABLE
> at java.util.zip.Inflater.inflateBytes(Native Method)
> at java.util.zip.Inflater.inflate(Inflater.java:259)
> - locked <0x00002b278f7b9da8> (a java.util.zip.ZStreamRef)
> at
> java.util.zip.InflaterInputStream.read(InflaterInputStream.java:152)
> at java.io.FilterInputStream.read(FilterInputStream.java:133)
> at
> org.apache.xerces.impl.XMLEntityManager$RewindableInputStream.read(Unknown
> Source)
> at org.apache.xerces.impl.io.UTF8Reader.read(Unknown Source)
> at org.apache.xerces.impl.XMLEntityScanner.load(Unknown Source)
> at org.apache.xerces.impl.XMLEntityScanner.scanQName(Unknown Source)
> at
> org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanStartElement(Unknown
> Source)
> at
> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown
> Source)
> at
> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown
> Source)
> at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
> at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
> at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
> at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
> at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)
> at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:150)
> at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2594)
> at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2582)
> at
> org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2656)
> at
> org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2606)
> at
> org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2519)
> - locked <0x00002b3114eb4a98> (a org.apache.hadoop.conf.Configuration)
> at org.apache.hadoop.conf.Configuration.get(Configuration.java:1091)
> at
> org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:1145)
> at
> org.apache.hadoop.conf.Configuration.getBoolean(Configuration.java:1546)
> at
> org.apache.hadoop.util.WhiteListFileManager.refresh(WhiteListFileManager.java:176)
> - locked <0x00002b2d6fe06a28> (a java.lang.Class for
> org.apache.hadoop.util.WhiteListFileManager)
> at
> org.apache.hadoop.util.ExtensionManager$2.run(ExtensionManager.java:70)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]