[
https://issues.apache.org/jira/browse/HDFS-9952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15198770#comment-15198770
]
Walter Su commented on HDFS-9952:
---------------------------------
bq. MutableRate#add is synchronized in an extremely critical code path which
will destroy concurrent read ops.
We have many nested locks, for example:
{noformat}
getBlockLocations(..)
--> readLock()
--> isInSafeMode()
--> synchronized isInManualOrResourceLowSafeMode()
listCorruptFileBlocks(..)
--> readLock()
--> blockManager.getCorruptReplicaBlockIterator()
--> synchronized Iterator<BlockInfo> iterator(int level)
{noformat}
It's pretty difficult not to use any nested locks. I think if the time frame of
(holding) inside (write)lock is short, comparing to that of holding outside
lock, it's probably that N threads pass through the inside lock at different
time. If there's little contention for inside lock, it hardly increase the
contention for ouside lock. It's just every thread holds the outside lock a
little longer because of the additional logic.
In this case, the time frame of holding MutableRate lock is short. What it does
inside the lock is simple algebraic calculation. But assume fsWriteLock is just
released, and many threads are waiting at the entrance of fsReadLock. If
MutableRate lock is the first thing inside the door of fsReadLock, then there's
lots contention for MutableRate lock once those threads get inside the door at
the same time.
What if we save the value at ThreadLocal, and after we release the fsReadLock,
we add it to metrics? ThreadLocal is lock free.
I'm not expert at lock, just what I thought.
> Expose FSNamesystem lock wait time as metrics
> ---------------------------------------------
>
> Key: HDFS-9952
> URL: https://issues.apache.org/jira/browse/HDFS-9952
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: namenode
> Reporter: Vinayakumar B
> Assignee: Vinayakumar B
> Attachments: HDFS-9952-01.patch, HDFS-9952-02.patch
>
>
> Expose FSNameSystem's readlock() and writeLock() wait time as metrics.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)