[jira] [Commented] (HDFS-13821) RBF: Add dfs.federation.router.mount-table.cache.enable so that users can disable cache

Fei Hui (JIRA) Tue, 14 Aug 2018 08:26:08 -0700


    [ 
https://issues.apache.org/jira/browse/HDFS-13821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16579955#comment-16579955
 ]


Fei Hui commented on HDFS-13821:
--------------------------------

[~elgoiri] Thanks for your reply.
{quote}
The hit ratio of the cache over time
{quote}
The hit ratio is nearyly zero in my test case
{quote}
The stats on the read/write lock
{quote}
Maybe it's unrelated to the read/write lock, because my approach code is in 
read lock block as follow
{code:java}
  @Override
  public PathLocation getDestinationForPath(final String path)
      throws IOException {
    verifyMountTable();
    readLock.lock();
    try {
      if (!mountTableCacheEnable) {
        return lookupLocation(path);
      }
      Callable<? extends PathLocation> meh = new Callable<PathLocation>() {
        @Override
        public PathLocation call() throws Exception {
          return lookupLocation(path);
        }
      };
      return this.locationCache.get(path, meh);
    } catch (ExecutionException e) {
      throw new IOException(e);
    } finally {
      readLock.unlock();
    }
  }
{code}
{quote}
The time for a hit or a miss
{quote}
I do not stat it, but i am guessing localcache is the bottleneck in my test 
case and i do a test. 
The test code is uploaded *LocalCacheTest.java*. i run 'hadoop jar 
test-1.0-SNAPSHOT.jar LocalCacheTest 10000 1024 1000' which means cachesize is 
10000, threads number is 1024 and 1000 get ops each thread. The result : 
elapse:24.555 ms per op each thread.
There is a lock in localcache. I think In my test case 1024 concurrent threads 
computing is better than one thread holds the write lock and other threads 
blocked.


> RBF: Add dfs.federation.router.mount-table.cache.enable so that users can 
> disable cache
> ---------------------------------------------------------------------------------------
>
>                 Key: HDFS-13821
>                 URL: https://issues.apache.org/jira/browse/HDFS-13821
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs
>    Affects Versions: 3.1.0, 2.9.1, 3.0.3
>            Reporter: Fei Hui
>            Priority: Major
>         Attachments: HDFS-13821.001.patch, image-2018-08-13-11-27-49-023.png
>
>
> When i test rbf, if found performance problem.
> I found that ProxyAvgTime From Ganglia is so high, i run jstack on Router and 
> get the following stack frames
> {quote}
>    java.lang.Thread.State: WAITING (parking)
>         at sun.misc.Unsafe.park(Native Method)
>         - parking to wait for  <0x00000005c264acd8> (a 
> java.util.concurrent.locks.ReentrantLock$NonfairSync)
>         at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
>         at 
> java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
>         at 
> java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
>         at 
> com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2249)
>         at 
> com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2228)
>         at com.google.common.cache.LocalCache.get(LocalCache.java:3965)
>         at 
> com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4764)
>         at 
> org.apache.hadoop.hdfs.server.federation.resolver.MountTableResolver.getDestinationForPath(MountTableResolver.java:380)
>         at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getLocationsForPath(RouterRpcServer.java:2104)
>         at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getLocationsForPath(RouterRpcServer.java:2087)
>         at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getListing(RouterRpcServer.java:1050)
>         at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getListing(ClientNamenodeProtocolServerSideTranslatorPB.java:640)
>         at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>         at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2115)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2111)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:422)
> {quote}
> Many threads blocked on *LocalCache*
> After disable the cache, ProxyAvgTime is down as follow showed
>  !image-2018-08-13-11-27-49-023.png! 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDFS-13821) RBF: Add dfs.federation.router.mount-table.cache.enable so that users can disable cache

Reply via email to