[
https://issues.apache.org/jira/browse/HDFS-15217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17245565#comment-17245565
]
Ahmed Hussein commented on HDFS-15217:
--------------------------------------
We have seen a slowdown in token ops compared to older releases. So, we were
looking into ways of optimizations into 3.x.
I came across this jira by chance.
[~brfrn169], I have couple of questions about the justifications behind adding
that code:
* The ops information can be retrieved from the AuditLog. Isn't the AuditLog
enough to see the ops?
* Was there a concern regarding a possible deadLock? Then, why not using
{{debug}} instead of adding that overhead to hot production code?
I can see that all unlock operations are calling
{{getLockReportInfoSupplier()}}. Even, {{getLockReportInfoSupplier(null)}} is
not a no-op.
I saw quick evaluations about the overhead, but I am concerned for the
following reasons:
* There is an overhead even though the string evaluation is lazy. The overhead
is: allocating a new {{Supplier<String>}} object even though the supplier get
method is not being called. This is a capturing lambda. Therefore, it has to be
evaluated every time.
* In a production system, Allocating an object while releasing a lock is
dangerous because that last allocation could trigger a GC. This makes
evaluating the patch tricky because there is a considerable large delta error
depending on whether or not a GC has been triggered. The worst case scenario is
to-trigger a GC while allocating an object that is going to be suppressed at
the end.
* On a production system, this is "Hot" . Especially;
{{getLockReportInfoSupplier()}} is added to all the token ops such as
{{getDelegationToken()}}.
* The overhead of allocating the supplier could be useless because
{{writeUnlock}} will suppress the info.
CC: [~ayushtkn], [~inigoiri]
> Add more information to longest write/read lock held log
> --------------------------------------------------------
>
> Key: HDFS-15217
> URL: https://issues.apache.org/jira/browse/HDFS-15217
> Project: Hadoop HDFS
> Issue Type: Improvement
> Reporter: Toshihiro Suzuki
> Assignee: Toshihiro Suzuki
> Priority: Major
> Fix For: 3.4.0
>
>
> Currently, we can see the stack trace in the longest write/read lock held
> log, but sometimes we need more information, for example, a target path of
> deletion:
> {code:java}
> 2020-03-10 21:51:21,116 [main] INFO namenode.FSNamesystem
> (FSNamesystemLock.java:writeUnlock(276)) - Number of suppressed
> write-lock reports: 0
> Longest write-lock held at 2020-03-10 21:51:21,107+0900 for 6ms via
> java.lang.Thread.getStackTrace(Thread.java:1559)
> org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1058)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:257)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:233)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1706)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:3188)
> ...
> {code}
> Adding more information (opName, path, etc.) to the log is useful to
> troubleshoot.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]