[
https://issues.apache.org/jira/browse/HDFS-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15074718#comment-15074718
]
Walter Su commented on HDFS-8966:
---------------------------------
I'm worried about deadlock. Shall we pre-determine the lock ordering? Mostly
about {{fsnLock}}, {{fsdLock}}, and {{BlockManagerLock}}.
1. BlockManager -> fsn.getBlockCollection(id) -> fsd.getInode(id) will acquire
{{fsdLock}}
2. A lot of fsdir ops will call bm's method (with fsdLock locked).
Too many locks causes confusion.
3.
{code}
// FSDirectory.java
// lock to protect the directory and BlockMap
private final ReentrantReadWriteLock dirLock;
{code}
That's not true. I saw in many places, fsn calls bm's method without fsdLock
locked. Actually it is fsnLock who protect both directory and BlockMap.
Can we retire fsnLock, and use fsdLock to protect namespace, and
BlockManagerLock to protect BlockMap?
> Separate the lock used in namespace and block management layer
> --------------------------------------------------------------
>
> Key: HDFS-8966
> URL: https://issues.apache.org/jira/browse/HDFS-8966
> Project: Hadoop HDFS
> Issue Type: Improvement
> Reporter: Haohui Mai
> Assignee: Haohui Mai
>
> Currently the namespace and the block management layer share one giant lock.
> One consequence that we have seen more and more often is that the namespace
> hangs due to excessive activities from the block management layer. For
> example, the NN might take a couple hundred milliseconds to handle a large
> block report. Because the NN holds the write lock during processing the block
> report, all namespace requests are paused. In production we have seen these
> lock contentions cause long latencies and instabilities in the cluster.
> This umbrella jira proposes to separate the lock used by namespace and the
> block management layer.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)