[ 
https://issues.apache.org/jira/browse/HDFS-13821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16577855#comment-16577855
 ] 

Fei Hui commented on HDFS-13821:
--------------------------------

[~linyiqun] Test is a MR job which just does *ls* operations.
This cache is as follow
{code:java}
  /** Path -> Remote location. */
  private Cache<String, PathLocation> locationCache;
{code}
{code}
  public PathLocation getDestinationForPath(final String path)
      throws IOException {
    verifyMountTable();
    readLock.lock();
    try {
      Callable<? extends PathLocation> meh = new Callable<PathLocation>() {
        @Override
        public PathLocation call() throws Exception {
          return lookupLocation(path);
        }
      };
      return this.locationCache.get(path, meh);
    } catch (ExecutionException e) {
      throw new IOException(e);
    } finally {
      readLock.unlock();
    }
  }
{code}
Maybe it is unrelated to state store. It caches the request paths, and return 
the remote location. Maybe request paths are hundreds of millions and the cache 
is inefficient


> RBF:add dfs.federation.router.mount-table.cache.enable so that users can 
> disable cache
> --------------------------------------------------------------------------------------
>
>                 Key: HDFS-13821
>                 URL: https://issues.apache.org/jira/browse/HDFS-13821
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs
>    Affects Versions: 3.1.0, 2.9.1, 3.0.3
>            Reporter: Fei Hui
>            Priority: Major
>         Attachments: HDFS-13821.001.patch, image-2018-08-13-11-27-49-023.png
>
>
> When i test rbf, if found performance problem.
> I found that ProxyAvgTime From Ganglia is so high, i run jstack on Router and 
> get the following stack frames
> {quote}
>    java.lang.Thread.State: WAITING (parking)
>         at sun.misc.Unsafe.park(Native Method)
>         - parking to wait for  <0x00000005c264acd8> (a 
> java.util.concurrent.locks.ReentrantLock$NonfairSync)
>         at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
>         at 
> java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
>         at 
> java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
>         at 
> com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2249)
>         at 
> com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2228)
>         at com.google.common.cache.LocalCache.get(LocalCache.java:3965)
>         at 
> com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4764)
>         at 
> org.apache.hadoop.hdfs.server.federation.resolver.MountTableResolver.getDestinationForPath(MountTableResolver.java:380)
>         at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getLocationsForPath(RouterRpcServer.java:2104)
>         at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getLocationsForPath(RouterRpcServer.java:2087)
>         at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getListing(RouterRpcServer.java:1050)
>         at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getListing(ClientNamenodeProtocolServerSideTranslatorPB.java:640)
>         at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>         at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2115)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2111)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:422)
> {quote}
> Many threads blocked on *LocalCache*
> After disable the cache, ProxyAvgTime is down as follow showed
>  !image-2018-08-13-11-27-49-023.png! 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to