[
https://issues.apache.org/jira/browse/HDFS-12200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16102989#comment-16102989
]
Jiandan Yang commented on HDFS-12200:
--------------------------------------
[~brahmareddy] Please help me review it. Thanks a log.
> Optimize CachedDNSToSwitchMapping to avoid 100% cpu utilization
> ---------------------------------------------------------------
>
> Key: HDFS-12200
> URL: https://issues.apache.org/jira/browse/HDFS-12200
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: namenode
> Reporter: Jiandan Yang
> Assignee: Jiandan Yang
> Attachments: cpu_ utilization.png, HDFS-12200-001.patch,
> HDFS-12200-002.patch, HDFS-12200-003.patch, nn_thread_num.png
>
>
> 1. Background :
> Our hadoop cluster is disaggregated storage and compute, HDFS is deployed to
> 600+ machines, YARN is deployed to another machine pool.
> We found that sometimes NameNode cpu utilization rate of 90% or even 100%.
> The most serious is cpu utilization rate of 100% for a long time result in
> writing journalNode timeout, eventually leading to NameNode hang up. The
> reason is offline tasks running in a few hundred servers access HDFS at the
> same time, NameNode resolve rack of client machine, started several hundreds
> to two thousand sub-process.
> {code:java}
> "process reaper"#10864 daemon prio=10 os_prio=0 tid=0x00007fe270a31800
> nid=0x38d93 runnable [0x00007fcdc36fc000]
> java.lang.Thread.State: RUNNABLE
> at java.lang.UNIXProcess.waitForProcessExit(Native Method)
> at java.lang.UNIXProcess.lambda$initStreams$4(UNIXProcess.java:301)
> at java.lang.UNIXProcess$$Lambda$7/1447689627.run(Unknown Source)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1147)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622)
> at java.lang.Thread.run(Thread.java:834
> {code}
> Our configuration as follows:
> {code:java}
> net.topology.node.switch.mapping.impl = ScriptBasedMapping,
> net.topology.script.file.name = 'a python script'
> {code}
> 2. Optimization
> In order to solve these two problems, we have optimized the
> CachedDNSToSwitchMapping
> (1) Added the DataNode IP list to the file of dfs.hosts configured. when
> NameNode starts it preloads DataNode rack information to the cache, get a
> batch of racks of hosts when running script once (the corresponding
> configuration is net.topology.script.number,the default value of 100)
> (2) Step (1) has ensured that the cache has all the DataNodes’ rack, so if
> the cache did not hit, then the host must be a client machine, then directly
> return /default-rack,
> (3) Each time you add new DataNodes you need to add the new DataNodes’ IP
> address to the file specified by dfs.hosts, and then run command of bin/hdfs
> dfsadmin -refreshNodes, it will put the newly added DataNodes’ rack into cache
> (4) Add new configuration items dfs.namenode.topology.resolve-non-cache-host,
> the value is false to open the above function, and the value is true to turn
> off the above functions, default value is true to keep compatibility
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]