[
https://issues.apache.org/jira/browse/HDFS-13183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17112857#comment-17112857
]
Xiaoqiao He commented on HDFS-13183:
------------------------------------
After dig deep about BalancerWithObserver, the root cause of failed unit test
TestBalancerWithHANameNodes#testBalancerWithObserver is that verify #getBlocks
invoke times, as the following code segment. When open Observer Read feature,
seems it does not request the first Observer NameNode every time. When there
are two Observer NameNodes are alive, it could request random one in this case.
So it is 50% possible to execute failed. IMO it is not related to this changes.
I would like to file another JIRA to trace it.
{code:java}
doTest(conf);
for (int i = 0; i < cluster.getNumNameNodes(); i++) {
// First observer node is at idx 2, or 3 if 2 has been shut down
// It should get both getBlocks calls, all other NNs should see 0 calls
int expectedObserverIdx = withObserverFailure ? 3 : 2;
int expectedCount = (i == expectedObserverIdx) ? 2 : 0;
verify(namesystemSpies.get(i), times(expectedCount))
.getBlocks(any(), anyLong(), anyLong());
}
{code}
try to trigger yetus manually, and check the result again.
> Standby NameNode process getBlocks request to reduce Active load
> ----------------------------------------------------------------
>
> Key: HDFS-13183
> URL: https://issues.apache.org/jira/browse/HDFS-13183
> Project: Hadoop HDFS
> Issue Type: New Feature
> Components: balancer & mover, namenode
> Reporter: Xiaoqiao He
> Assignee: Xiaoqiao He
> Priority: Major
> Fix For: 3.3.1, 3.4.0
>
> Attachments: HDFS-13183-trunk.001.patch, HDFS-13183-trunk.002.patch,
> HDFS-13183-trunk.003.patch, HDFS-13183.004.patch, HDFS-13183.005.patch,
> HDFS-13183.006.patch, HDFS-13183.007.patch, HDFS-13183.addendum.patch,
> HDFS-13183.addendum.patch
>
>
> The performance of Active NameNode could be impact when {{Balancer}} requests
> #getBlocks, since query blocks of overly full DNs performance is extremely
> inefficient currently. The main reason is {{NameNodeRpcServer#getBlocks}}
> hold read lock for long time. In extreme case, all handlers of Active
> NameNode RPC server are occupied by one reader
> {{NameNodeRpcServer#getBlocks}} and other write operation calls, thus Active
> NameNode enter a state of false death for number of seconds even for minutes.
> The similar performance concerns of Balancer have reported by HDFS-9412,
> HDFS-7967, etc.
> If Standby NameNode can shoulder #getBlocks heavy burden, it could speed up
> the progress of balancing and reduce performance impact to Active NameNode.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]