[ https://issues.apache.org/jira/browse/HDFS-12200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jiandan Yang updated HDFS-12200: --------------------------------- Summary: Optimize CachedDNSToSwitchMapping to avoid 100% cpu utilization (was: Optimize CachedDNSToSwitchMapping to avoid high cpu utilization) > Optimize CachedDNSToSwitchMapping to avoid 100% cpu utilization > --------------------------------------------------------------- > > Key: HDFS-12200 > URL: https://issues.apache.org/jira/browse/HDFS-12200 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode > Reporter: Jiandan Yang > Assignee: Jiandan Yang > Attachments: cpu_ utilization.png, HDFS-12200-001.patch, > HDFS-12200-002.patch, nn_thread_num.png > > > 1. Background : > Our hadoop cluster is disaggregated storage and compute, HDFS is deployed to > 600+ machines, YARN is deployed to another machine pool. > We found that sometimes NameNode cpu utilization rate of 90% or even 100%. > The most serious is cpu utilization rate of 100% for a long time result in > writing journalNode timeout, eventually leading to NameNode hang up. The > reason is offline tasks running in a few hundred servers access HDFS at the > same time, NameNode resolve rack of client machine, started several hundreds > to two thousand sub-process. > {code:java} > "process reaper"#10864 daemon prio=10 os_prio=0 tid=0x00007fe270a31800 > nid=0x38d93 runnable [0x00007fcdc36fc000] > java.lang.Thread.State: RUNNABLE > at java.lang.UNIXProcess.waitForProcessExit(Native Method) > at java.lang.UNIXProcess.lambda$initStreams$4(UNIXProcess.java:301) > at java.lang.UNIXProcess$$Lambda$7/1447689627.run(Unknown Source) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1147) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622) > at java.lang.Thread.run(Thread.java:834 > {code} > Our configuration as follows: > {code:java} > net.topology.node.switch.mapping.impl = ScriptBasedMapping, > net.topology.script.file.name = 'a python script' > {code} > 2. Optimization > In order to solve these two problems, we have optimized the > CachedDNSToSwitchMapping > (1) Added the DataNode IP list to the file of dfs.hosts configured. when > NameNode starts it preloads DataNode rack information to the cache, get a > batch of racks of hosts when running script once (the corresponding > configuration is net.topology.script.number,the default value of 100) > (2) Step (1) has ensured that the cache has all the DataNodes’ rack, so if > the cache did not hit, then the host must be a client machine, then directly > return /default-rack, > (3) Each time you add new DataNodes you need to add the new DataNodes’ IP > address to the file specified by dfs.hosts, and then run command of bin/hdfs > dfsadmin -refreshNodes, it will put the newly added DataNodes’ rack into cache > (4) Add new configuration items dfs.namenode.topology.resolve-non-cache-host, > the value is false to open the above function, and the value is true to turn > off the above functions, default value is true to keep compatibility -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org