[ https://issues.apache.org/jira/browse/HDFS-13821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16578110#comment-16578110 ]
Fei Hui commented on HDFS-13821: -------------------------------- [~elgoiri] Profiling LocalCache is interesing, should we adress a new JIRA? Disabling Cache is efficient when hundreds of millions files are access, in such scenes ProxyAvgTime only costs 0.0x ms by directly computing remote path. > RBF: Add dfs.federation.router.mount-table.cache.enable so that users can > disable cache > --------------------------------------------------------------------------------------- > > Key: HDFS-13821 > URL: https://issues.apache.org/jira/browse/HDFS-13821 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs > Affects Versions: 3.1.0, 2.9.1, 3.0.3 > Reporter: Fei Hui > Priority: Major > Attachments: HDFS-13821.001.patch, image-2018-08-13-11-27-49-023.png > > > When i test rbf, if found performance problem. > I found that ProxyAvgTime From Ganglia is so high, i run jstack on Router and > get the following stack frames > {quote} > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x00000005c264acd8> (a > java.util.concurrent.locks.ReentrantLock$NonfairSync) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199) > at > java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209) > at > java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285) > at > com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2249) > at > com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2228) > at com.google.common.cache.LocalCache.get(LocalCache.java:3965) > at > com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4764) > at > org.apache.hadoop.hdfs.server.federation.resolver.MountTableResolver.getDestinationForPath(MountTableResolver.java:380) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getLocationsForPath(RouterRpcServer.java:2104) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getLocationsForPath(RouterRpcServer.java:2087) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getListing(RouterRpcServer.java:1050) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getListing(ClientNamenodeProtocolServerSideTranslatorPB.java:640) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2115) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2111) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > {quote} > Many threads blocked on *LocalCache* > After disable the cache, ProxyAvgTime is down as follow showed > !image-2018-08-13-11-27-49-023.png! -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org