Ryan Wu created HDFS-15487: ------------------------------ Summary: ScriptBasedMapping lead 100% cpu utilization Key: HDFS-15487 URL: https://issues.apache.org/jira/browse/HDFS-15487 Project: Hadoop HDFS Issue Type: Improvement Reporter: Ryan Wu
We found that sometimes NameNode cpu utilization rate of 90% leading to NameNode hang up. The reason is that flink apps on k8s access HDFS at the same time, however their ip and host name is not fixed. So that run topology script at the same time. From jstack file, also found it started several hundreds python processes. {code:java} // "process reaper" #36159 daemon prio=10 os_prio=0 tid=0x00007fa7a33fa7a0 nid=0xa3cb waiting on condition [0x00007fa7a61dc000] java.lang.Thread.State: TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00007fb4094a0398> (a java.util.concurrent.SynchronousQueue$TransferStack) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:460) at java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:362) at java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:941) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1066) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org