[ 
https://issues.apache.org/jira/browse/HDFS-16726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yuyanlei updated HDFS-16726:
----------------------------
    Description: 
In the cluster, the memory usage of Namenode exceeds the XMX setting (XMX 
=280GB). The actual memory usage of Namenode is 479GB

Output via pamp:

       Address Perm   Offset Device    Inode      Size       Rss       Pss 
Referenced Anonymous Swap Locked Mapping
  2b42f0000000 rw-p 00000000  00:00        0 294174720 293756960 293756960  
293756960 293756960    0      0 
      01e21000 rw-p 00000000  00:00        0 195245456 195240848 195240848  
195240848 195240848    0      0 [heap]
  2b897c000000 rw-p 00000000  00:00        0   9246724   9246724   9246724    
9246724   9246724    0      0 
  2b8bb0905000 rw-p 00000000  00:00        0   1781124   1754572   1754572    
1754572   1754572    0      0 
  2b8936000000 rw-p 00000000  00:00        0   1146880   1002084   1002084    
1002084   1002084    0      0 
  2b42db652000 rwxp 00000000  00:00        0     57792     55252     55252      
55252     55252    0      0 
  2b42ec12a000 rw-p 00000000  00:00        0     25696     24700     24700      
24700     24700    0      0 
  2b42ef25b000 rw-p 00000000  00:00        0      9988      8972      8972      
 8972      8972    0      0 
  2b8c1d467000 rw-p 00000000  00:00        0      9216      8204      8204      
 8204      8204    0      0 
  2b8d6f8db000 rw-p 00000000  00:00        0      7160      6228      6228      
 6228      6228    0      0 

The first line should configure the memory footprint for XMX, and [heap] is 
unusually large, so a memory leak is suspected!

 
 * [heap] is associated with malloc

After configuring JCMD in the test environment, we found that the malloc part 
of Internal in JCMD increased significantly when the client was writing to a gz 
file (XMX =40g in the test environment, and the Internal area was 900MB before 
the client wrote) :

Total: reserved=47276MB, committed=47070MB
 -                 Java Heap (reserved=40960MB, committed=40960MB)
                            (mmap: reserved=40960MB, committed=40960MB) 
 
 -                     Class (reserved=53MB, committed=52MB)
                            (classes #7423)
                            (malloc=1MB #17053) 
                            (mmap: reserved=52MB, committed=52MB) 
 
 -                    Thread (reserved=2145MB, committed=2145MB)
                            (thread #2129)
                            (stack: reserved=2136MB, committed=2136MB)
                            (malloc=7MB #10673) 
                            (arena=2MB #4256)
 
 -                      Code (reserved=251MB, committed=45MB)
                            (malloc=7MB #10661) 
                            (mmap: reserved=244MB, committed=38MB) 
 
 -                        GC (reserved=2307MB, committed=2307MB)
                            (malloc=755MB #525664) 
                            (mmap: reserved=1552MB, committed=1552MB) 
 
 -                  Compiler (reserved=8MB, committed=8MB)
                            (malloc=8MB #8852) 
 
 -                  Internal (reserved=1524MB, committed=1524MB)
                            (malloc=1524MB #323482) 
 
 -                    Symbol (reserved=12MB, committed=12MB)
                            (malloc=10MB #91715) 
                            (arena=2MB #1)
 
 -    Native Memory Tracking (reserved=16MB, committed=16MB)
                            (tracking overhead=15MB)

It is clear that the Internal malloc increases significantly when the client 
writes, and does not decrease after the client stops writing

 

Through pref, I found some more instances when writing on the client side:

Children      Self  Comm  Shared Ob  Symbol                                     
                                                                                
                                                
     0.05%     0.00%  java  libzip.so  [.] Java_java_util_zip_ZipFile_getEntry
     0.02%     0.00%  java  libzip.so  [.] 
Java_java_util_zip_Inflater_inflateBytes

Therefore, it is suspected that the compressed write operation of the client 
may have a memory leak problem

 

Use JCMD to locate the call link to Java_java_util_zip_Inflater_inflateBytes:

"ExtensionRefresher" #59 daemon prio=5 os_prio=0 tid=0x000000002419d000 
nid=0x69df runnable [0x00002b319d7a0000]
   java.lang.Thread.State: RUNNABLE
        at java.util.zip.Inflater.inflateBytes(Native Method)
        at java.util.zip.Inflater.inflate(Inflater.java:259)
        - locked <0x00002b278f7b9da8> (a java.util.zip.ZStreamRef)
        at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:152)
        at java.io.FilterInputStream.read(FilterInputStream.java:133)
        at 
org.apache.xerces.impl.XMLEntityManager$RewindableInputStream.read(Unknown 
Source)
        at org.apache.xerces.impl.io.UTF8Reader.read(Unknown Source)
        at org.apache.xerces.impl.XMLEntityScanner.load(Unknown Source)
        at org.apache.xerces.impl.XMLEntityScanner.scanQName(Unknown Source)
        at 
org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanStartElement(Unknown Source)
        at 
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown
 Source)
        at 
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown 
Source)
        at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
        at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
        at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
        at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
        at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)
        at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:150)
        at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2594)
        at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2582)
        at 
org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2656)
        at 
org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2606)
        at 
org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2519)
        - locked <0x00002b3114eb4a98> (a org.apache.hadoop.conf.Configuration)
        at org.apache.hadoop.conf.Configuration.get(Configuration.java:1091)
        at 
org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:1145)
        at 
org.apache.hadoop.conf.Configuration.getBoolean(Configuration.java:1546)
        at 
org.apache.hadoop.util.WhiteListFileManager.refresh(WhiteListFileManager.java:176)
        - locked <0x00002b2d6fe06a28> (a java.lang.Class for 
org.apache.hadoop.util.WhiteListFileManager)
        at 
org.apache.hadoop.util.ExtensionManager$2.run(ExtensionManager.java:70)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

  was:
In the cluster, the memory usage of Namenode exceeds the XMX setting (XMX 
=280GB). The actual memory usage of Namenode is 479GB

Output via pamp:

       Address Perm   Offset Device    Inode      Size       Rss       Pss 
Referenced Anonymous Swap Locked Mapping
  2b42f0000000 rw-p 00000000  00:00        0 294174720 293756960 293756960  
293756960 293756960    0      0 
      01e21000 rw-p 00000000  00:00        0 195245456 195240848 195240848  
195240848 195240848    0      0 [heap]
  2b897c000000 rw-p 00000000  00:00        0   9246724   9246724   9246724    
9246724   9246724    0      0 
  2b8bb0905000 rw-p 00000000  00:00        0   1781124   1754572   1754572    
1754572   1754572    0      0 
  2b8936000000 rw-p 00000000  00:00        0   1146880   1002084   1002084    
1002084   1002084    0      0 
  2b42db652000 rwxp 00000000  00:00        0     57792     55252     55252      
55252     55252    0      0 
  2b42ec12a000 rw-p 00000000  00:00        0     25696     24700     24700      
24700     24700    0      0 
  2b42ef25b000 rw-p 00000000  00:00        0      9988      8972      8972      
 8972      8972    0      0 
  2b8c1d467000 rw-p 00000000  00:00        0      9216      8204      8204      
 8204      8204    0      0 
  2b8d6f8db000 rw-p 00000000  00:00        0      7160      6228      6228      
 6228      6228    0      0 

The first line should configure the memory footprint for XMX, and [heap] is 
unusually large, so a memory leak is suspected!

 
 * [heap] is associated with malloc

After configuring JCMD in the test environment, we found that the malloc part 
of Internal in JCMD increased significantly when the client was writing to a gz 
file (XMX =40g in the test environment, and the Internal area was 900MB before 
the client wrote) :

Total: reserved=47276MB, committed=47070MB
-                 Java Heap (reserved=40960MB, committed=40960MB)
                            (mmap: reserved=40960MB, committed=40960MB) 
 
-                     Class (reserved=53MB, committed=52MB)
                            (classes #7423)
                            (malloc=1MB #17053) 
                            (mmap: reserved=52MB, committed=52MB) 
 
-                    Thread (reserved=2145MB, committed=2145MB)
                            (thread #2129)
                            (stack: reserved=2136MB, committed=2136MB)
                            (malloc=7MB #10673) 
                            (arena=2MB #4256)
 
-                      Code (reserved=251MB, committed=45MB)
                            (malloc=7MB #10661) 
                            (mmap: reserved=244MB, committed=38MB) 
 
-                        GC (reserved=2307MB, committed=2307MB)
                            (malloc=755MB #525664) 
                            (mmap: reserved=1552MB, committed=1552MB) 
 
-                  Compiler (reserved=8MB, committed=8MB)
                            (malloc=8MB #8852) 
 
-                  Internal (reserved=1524MB, committed=1524MB)
                            (malloc=1524MB #323482) 
 
-                    Symbol (reserved=12MB, committed=12MB)
                            (malloc=10MB #91715) 
                            (arena=2MB #1)
 
-    Native Memory Tracking (reserved=16MB, committed=16MB)
                            (tracking overhead=15MB)

It is clear that the Internal malloc increases significantly when the client 
writes, and does not decrease after the client stops writing

 

Through pref, I found some more instances when writing on the client side:

Children      Self  Comm  Shared Ob  Symbol                                     
                                                                                
                                                
     0.05%     0.00%  java  libzip.so  [.] Java_java_util_zip_ZipFile_getEntry
     0.02%     0.00%  java  libzip.so  [.] 
Java_java_util_zip_Inflater_inflateBytes

Therefore, it is suspected that the compressed write operation of the client 
may have a memory leak problem


> There is a memory-related problem about HDFS namenode
> -----------------------------------------------------
>
>                 Key: HDFS-16726
>                 URL: https://issues.apache.org/jira/browse/HDFS-16726
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs, namenode
>    Affects Versions: 2.7.2
>            Reporter: yuyanlei
>            Priority: Critical
>         Attachments: 图片_lanxin_20220809153722.png
>
>
> In the cluster, the memory usage of Namenode exceeds the XMX setting (XMX 
> =280GB). The actual memory usage of Namenode is 479GB
> Output via pamp:
>        Address Perm   Offset Device    Inode      Size       Rss       Pss 
> Referenced Anonymous Swap Locked Mapping
>   2b42f0000000 rw-p 00000000  00:00        0 294174720 293756960 293756960  
> 293756960 293756960    0      0 
>       01e21000 rw-p 00000000  00:00        0 195245456 195240848 195240848  
> 195240848 195240848    0      0 [heap]
>   2b897c000000 rw-p 00000000  00:00        0   9246724   9246724   9246724    
> 9246724   9246724    0      0 
>   2b8bb0905000 rw-p 00000000  00:00        0   1781124   1754572   1754572    
> 1754572   1754572    0      0 
>   2b8936000000 rw-p 00000000  00:00        0   1146880   1002084   1002084    
> 1002084   1002084    0      0 
>   2b42db652000 rwxp 00000000  00:00        0     57792     55252     55252    
>   55252     55252    0      0 
>   2b42ec12a000 rw-p 00000000  00:00        0     25696     24700     24700    
>   24700     24700    0      0 
>   2b42ef25b000 rw-p 00000000  00:00        0      9988      8972      8972    
>    8972      8972    0      0 
>   2b8c1d467000 rw-p 00000000  00:00        0      9216      8204      8204    
>    8204      8204    0      0 
>   2b8d6f8db000 rw-p 00000000  00:00        0      7160      6228      6228    
>    6228      6228    0      0 
> The first line should configure the memory footprint for XMX, and [heap] is 
> unusually large, so a memory leak is suspected!
>  
>  * [heap] is associated with malloc
> After configuring JCMD in the test environment, we found that the malloc part 
> of Internal in JCMD increased significantly when the client was writing to a 
> gz file (XMX =40g in the test environment, and the Internal area was 900MB 
> before the client wrote) :
> Total: reserved=47276MB, committed=47070MB
>  -                 Java Heap (reserved=40960MB, committed=40960MB)
>                             (mmap: reserved=40960MB, committed=40960MB) 
>  
>  -                     Class (reserved=53MB, committed=52MB)
>                             (classes #7423)
>                             (malloc=1MB #17053) 
>                             (mmap: reserved=52MB, committed=52MB) 
>  
>  -                    Thread (reserved=2145MB, committed=2145MB)
>                             (thread #2129)
>                             (stack: reserved=2136MB, committed=2136MB)
>                             (malloc=7MB #10673) 
>                             (arena=2MB #4256)
>  
>  -                      Code (reserved=251MB, committed=45MB)
>                             (malloc=7MB #10661) 
>                             (mmap: reserved=244MB, committed=38MB) 
>  
>  -                        GC (reserved=2307MB, committed=2307MB)
>                             (malloc=755MB #525664) 
>                             (mmap: reserved=1552MB, committed=1552MB) 
>  
>  -                  Compiler (reserved=8MB, committed=8MB)
>                             (malloc=8MB #8852) 
>  
>  -                  Internal (reserved=1524MB, committed=1524MB)
>                             (malloc=1524MB #323482) 
>  
>  -                    Symbol (reserved=12MB, committed=12MB)
>                             (malloc=10MB #91715) 
>                             (arena=2MB #1)
>  
>  -    Native Memory Tracking (reserved=16MB, committed=16MB)
>                             (tracking overhead=15MB)
> It is clear that the Internal malloc increases significantly when the client 
> writes, and does not decrease after the client stops writing
>  
> Through pref, I found some more instances when writing on the client side:
> Children      Self  Comm  Shared Ob  Symbol                                   
>                                                                               
>                                                     
>      0.05%     0.00%  java  libzip.so  [.] Java_java_util_zip_ZipFile_getEntry
>      0.02%     0.00%  java  libzip.so  [.] 
> Java_java_util_zip_Inflater_inflateBytes
> Therefore, it is suspected that the compressed write operation of the client 
> may have a memory leak problem
>  
> Use JCMD to locate the call link to Java_java_util_zip_Inflater_inflateBytes:
> "ExtensionRefresher" #59 daemon prio=5 os_prio=0 tid=0x000000002419d000 
> nid=0x69df runnable [0x00002b319d7a0000]
>    java.lang.Thread.State: RUNNABLE
>         at java.util.zip.Inflater.inflateBytes(Native Method)
>         at java.util.zip.Inflater.inflate(Inflater.java:259)
>         - locked <0x00002b278f7b9da8> (a java.util.zip.ZStreamRef)
>         at 
> java.util.zip.InflaterInputStream.read(InflaterInputStream.java:152)
>         at java.io.FilterInputStream.read(FilterInputStream.java:133)
>         at 
> org.apache.xerces.impl.XMLEntityManager$RewindableInputStream.read(Unknown 
> Source)
>         at org.apache.xerces.impl.io.UTF8Reader.read(Unknown Source)
>         at org.apache.xerces.impl.XMLEntityScanner.load(Unknown Source)
>         at org.apache.xerces.impl.XMLEntityScanner.scanQName(Unknown Source)
>         at 
> org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanStartElement(Unknown 
> Source)
>         at 
> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown
>  Source)
>         at 
> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown 
> Source)
>         at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
>         at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
>         at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
>         at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
>         at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)
>         at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:150)
>         at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2594)
>         at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2582)
>         at 
> org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2656)
>         at 
> org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2606)
>         at 
> org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2519)
>         - locked <0x00002b3114eb4a98> (a org.apache.hadoop.conf.Configuration)
>         at org.apache.hadoop.conf.Configuration.get(Configuration.java:1091)
>         at 
> org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:1145)
>         at 
> org.apache.hadoop.conf.Configuration.getBoolean(Configuration.java:1546)
>         at 
> org.apache.hadoop.util.WhiteListFileManager.refresh(WhiteListFileManager.java:176)
>         - locked <0x00002b2d6fe06a28> (a java.lang.Class for 
> org.apache.hadoop.util.WhiteListFileManager)
>         at 
> org.apache.hadoop.util.ExtensionManager$2.run(ExtensionManager.java:70)
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>         at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>         at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to