[
https://issues.apache.org/jira/browse/HDFS-13946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16664002#comment-16664002
]
Erik Krogen commented on HDFS-13946:
------------------------------------
I looked more closely at the test and I see where my confusion was arising
from. Now the longest lock hold is coming from all the way up on line 256. So
it takes precedence over any other stack traces, preventing the verification
that is attempting to happen between t1 and t2 from being meaningful. We need
to reset the longest stack trace earlier on in the test so that the t1/t2
validation can still be meaningful. I attached a v007 patch demonstrating this;
let me know what you think.
> Log longest FSN write/read lock held stack trace
> ------------------------------------------------
>
> Key: HDFS-13946
> URL: https://issues.apache.org/jira/browse/HDFS-13946
> Project: Hadoop HDFS
> Issue Type: Improvement
> Affects Versions: 3.1.1
> Reporter: Yiqun Lin
> Assignee: Yiqun Lin
> Priority: Minor
> Attachments: HDFS-13946.001.patch, HDFS-13946.002.patch,
> HDFS-13946.003.patch, HDFS-13946.004.patch, HDFS-13946.005.patch,
> HDFS-13946.006.patch, HDFS-13946.007.patch
>
>
> FSN write/read lock log statement only prints longest lock held interval not
> its stack trace during suppress warning interval. Only current thread is
> printed, but it looks not so useful. Once NN is slowing down, the most
> important thing we take care is that which operation holds longest time of
> the lock.
> Following is log printed based on current logic.
> {noformat}
> 2018-09-30 13:56:06,700 INFO [IPC Server handler 119 on 8020]
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem write lock
> held for 11 ms via
> java.lang.Thread.getStackTrace(Thread.java:1589)
> org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:945)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:198)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1688)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.removeBlocks(FSNamesystem.java:4281)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:4247)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInt(FSNamesystem.java:4183)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:4167)
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:848)org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.delete(AuthorizationProviderProxyClientProtocol.java:311)
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:625)
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2226)
> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2222)
> java.security.AccessController.doPrivileged(Native Method)
> javax.security.auth.Subject.doAs(Subject.java:415)
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917)
> org.apache.hadoop.ipc.Server$Handler.run(Server.java:2220)
> Number of suppressed write-lock reports: 14
> Longest write-lock held interval: 70
> {noformat}
> Also it will be good for the trouble shooting.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]