[
https://issues.apache.org/jira/browse/HDFS-9412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15046417#comment-15046417
]
He Tianyi commented on HDFS-9412:
---------------------------------
[~andrew.wang] Perhaps switching to unfair RWLock may cause other issues, since
machine running NameNode does not necessarily have SMP architecture.
I think this is due to having many small blocks in cluster, {{getBlocks}} is
called by Balancer and will not return until exhausted or total size satisfies,
and there are actually many threads doing the same thing
({{dfs.balancer.dispatcherThreads}}).
Besides decreasing number of threads, maybe we can make this faster either.
> getBlocks occupies FSLock and takes too long to complete
> --------------------------------------------------------
>
> Key: HDFS-9412
> URL: https://issues.apache.org/jira/browse/HDFS-9412
> Project: Hadoop HDFS
> Issue Type: Improvement
> Reporter: He Tianyi
> Assignee: He Tianyi
>
> {{getBlocks}} in {{NameNodeRpcServer}} acquires a read lock then may take a
> long time to complete (probably several seconds, if number of blocks are too
> much).
> During this period, other threads attempting to acquire write lock will wait.
> In an extreme case, RPC handlers are occupied by one reader thread calling
> {{getBlocks}} and all other threads waiting for write lock, rpc server acts
> like hung. Unfortunately, this tends to happen in heavy loaded cluster, since
> read operations come and go fast (they do not need to wait), leaving write
> operations waiting.
> Looks like we can optimize this thing like DN block report did in past, by
> splitting the operation into smaller sub operations, and let other threads do
> their work between each sub operation. The whole result is returned at once,
> though (one thing different from DN block report).
> I am not sure whether this will work. Any better idea?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)