Ming Ma created HDFS-6306:
-----------------------------
Summary: Standby NN can hold FSDirectory's writeLock for a long
time under heavy load
Key: HDFS-6306
URL: https://issues.apache.org/jira/browse/HDFS-6306
Project: Hadoop HDFS
Issue Type: Bug
Reporter: Ming Ma
Standby NN uses FSEditLogLoader to update its namespace. It can hold
FSDirectory's writeLock for a long time when active NN generates lots of edits.
{noformat}
loadEditRecords
fsNamesys.writeLock();
fsDir.writeLock();
...
try {
while (true) {
try {
FSEditLogOp op;
try {
op = in.readOp();
...
}
}
} finally {
...
fsDir.writeUnlock();
fsNamesys.writeUnlock();
}
{noformat}
With the fix in https://issues.apache.org/jira/browse/HDFS-5693, JMX response
time is good for active NN as it no longer requires FSnamesystem's lock, even
though it still need to acquire FSDirectory's readlock during FSDirectory's
totalInodes. That isn't an issue for active NN as each client RPC request might
only acquire FSDirectory lock for short period of time. But Standby NN could
hold the lock for a longer period of time.
There are two ways to fix these:
1. Fix standby NN to acquire FSDirectory's writeLock for each edit record.
2. Fix FSDirectory's totalInodes to not take readLock so JMX can still go
through.
--
This message was sent by Atlassian JIRA
(v6.2#6252)