[
https://issues.apache.org/jira/browse/HDFS-13183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16382189#comment-16382189
]
Erik Krogen commented on HDFS-13183:
------------------------------------
I'm not sure that a specific new exception just for this situation is the right
move. I think ideally, the client (in this case the Balancer) should be able to
make the decision rather than the NN. For example, if the SbNN goes down, the
ANN is not aware of this, but the balancer should start to read from the ANN
instead of SbNN. The current approach is not able to handle such a situation.
The current handling may work as an interim solution until we develop out
HDFS-12976, but in that case I would rather reuse {{StandbyException}} and just
update its comment rather than creating a new class of exception. This has
better compatibility as well. Ping [~shv] for an opinion on this approach.
Additional comments on the patch:
* I realized that changing {{checkOperation}} to {{UNCHECKED}} in all cases is
wrong as that will allow {{getBlocks}} to be performed against the SbNN even if
the new config is disabled. For now the only thing that comes to mind is to do
something like {{checkOperation(balancerShouldRequestStandby ? UNCHECKED :
READ)}}, but I'm not too fond of it. Open to better ideas. It may be that we
want to create a new {{OperationCategory.STANDBY_READ}} and then use
{{checkOperation(balancerShouldRequestStandby ? STANDBY_READ : READ)}}; this
could do away with the explicit check of the service state
* In the test, we should confirm that the balancer actually fails over to the
SbNN, and that it is able to appropriately get blocks and trigger data movement
as a result.
> Standby NameNode process getBlocks request to reduce Active load
> ----------------------------------------------------------------
>
> Key: HDFS-13183
> URL: https://issues.apache.org/jira/browse/HDFS-13183
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: balancer & mover, namenode
> Affects Versions: 2.7.5, 3.1.0, 2.9.1, 2.8.4, 3.0.2
> Reporter: He Xiaoqiao
> Assignee: He Xiaoqiao
> Priority: Major
> Attachments: HDFS-13183-trunk.001.patch, HDFS-13183-trunk.002.patch,
> HDFS-13183-trunk.003.patch
>
>
> The performance of Active NameNode could be impact when {{Balancer}} requests
> #getBlocks, since query blocks of overly full DNs performance is extremely
> inefficient currently. The main reason is {{NameNodeRpcServer#getBlocks}}
> hold read lock for long time. In extreme case, all handlers of Active
> NameNode RPC server are occupied by one reader
> {{NameNodeRpcServer#getBlocks}} and other write operation calls, thus Active
> NameNode enter a state of false death for number of seconds even for minutes.
> The similar performance concerns of Balancer have reported by HDFS-9412,
> HDFS-7967, etc.
> If Standby NameNode can shoulder #getBlocks heavy burden, it could speed up
> the progress of balancing and reduce performance impact to Active NameNode.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]