He Tianyi created HDFS-9412:
-------------------------------
Summary: getBlocks occupies FSLock and takes too long to complete
Key: HDFS-9412
URL: https://issues.apache.org/jira/browse/HDFS-9412
Project: Hadoop HDFS
Issue Type: Improvement
Reporter: He Tianyi
Assignee: He Tianyi
{{getBlocks}} in {{NameNodeRpcServer}} acquires a read lock then may take a
long time to complete (probably several seconds, if number of blocks are too
much).
During this period, other threads attempting to acquire write lock will wait.
In an extreme case, RPC handlers are occupied by one reader thread calling
{{getBlocks}} and all other threads waiting for write lock, rpc server acts
like hung. Unfortunately, this tends to happen in heavy loaded cluster, since
read operations come and go fast (they do not need to wait), leaving write
operations waiting.
Looks like we can optimize this thing like DN block report did in past, by
splitting the operation into smaller sub operations, and let other threads do
their work between each sub operation. The whole result is returned at once,
though (one thing different from DN block report). But there will be no more
starvation.
I am not sure whether this will work. Any better idea?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)