[ 
https://issues.apache.org/jira/browse/HDFS-13946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16656332#comment-16656332
 ] 

Yiqun Lin commented on HDFS-13946:
----------------------------------

{quote}
can we add a test to FSNamesystemLock ensuring that this works as expected?
{quote}
I just make a adjustment for the failure tests in TestFSNamesystemLock. I think 
we don't need to add the new one. The logs printed in the test, it looks good.
{noformat}
--------------------read lock report---------------
2018-10-19 14:09:00,046 [Thread-1] INFO  namenode.FSNamesystem 
(FSNamesystemLock.java:readUnlock(212)) -        Number of suppressed read-lock 
reports: 2
        Longest read-lock held interval: 110ms via 
java.lang.Thread.getStackTrace(Thread.java:1556)
org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032)
org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.readUnlock(FSNamesystemLock.java:194)
org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.readUnlock(FSNamesystemLock.java:172)
org.apache.hadoop.hdfs.server.namenode.TestFSNamesystemLock.testFSReadLockLongHoldingReport(TestFSNamesystemLock.java:252)
--------------------write lock report---------------
2018-10-19 14:17:59,798 [Thread-1] INFO  namenode.FSNamesystem 
(FSNamesystemLock.java:writeUnlock(291)) -       Number of suppressed 
write-lock reports: 0
        Longest write-lock held interval: 101.0ms via 
java.lang.Thread.getStackTrace(Thread.java:1556)
org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032)
org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:277)
org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:239)
org.apache.hadoop.hdfs.server.namenode.TestFSNamesystemLock.testFSWriteLockReportSuppressed(TestFSNamesystemLock.java:390)
{noformat}

> Log longest FSN write/read lock held stack trace
> ------------------------------------------------
>
>                 Key: HDFS-13946
>                 URL: https://issues.apache.org/jira/browse/HDFS-13946
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>    Affects Versions: 3.1.1
>            Reporter: Yiqun Lin
>            Assignee: Yiqun Lin
>            Priority: Minor
>         Attachments: HDFS-13946.001.patch, HDFS-13946.002.patch, 
> HDFS-13946.003.patch, HDFS-13946.004.patch
>
>
> FSN write/read lock log statement only prints longest lock held interval not 
> its stack trace during suppress warning interval. Only current thread is 
> printed, but it looks not so useful. Once NN is slowing down, the most 
> important thing we take care is that which operation holds longest time of 
> the lock.
>  Following is log printed based on current logic.
> {noformat}
> 2018-09-30 13:56:06,700 INFO [IPC Server handler 119 on 8020] 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem write lock 
> held for 11 ms via
> java.lang.Thread.getStackTrace(Thread.java:1589)
> org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:945)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:198)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1688)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.removeBlocks(FSNamesystem.java:4281)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:4247)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInt(FSNamesystem.java:4183)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:4167)
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:848)org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.delete(AuthorizationProviderProxyClientProtocol.java:311)
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:625)
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2226)
> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2222)
> java.security.AccessController.doPrivileged(Native Method)
> javax.security.auth.Subject.doAs(Subject.java:415)
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917)
> org.apache.hadoop.ipc.Server$Handler.run(Server.java:2220)
>         Number of suppressed write-lock reports: 14
>         Longest write-lock held interval: 70
> {noformat}
> Also it will be good for the trouble shooting.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to