[
https://issues.apache.org/jira/browse/HDFS-13898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16622255#comment-16622255
]
Erik Krogen commented on HDFS-13898:
------------------------------------
{quote}
Why it has to scan all of the observers - wouldn't it just go to the next
available observer?
{quote}
This comes from the fact that the list of NameNodes the ORPP scans contains all
NameNodes. So let's assume we have 5 NameNodes: 1 Active, 2 Standby, 2
Observer. The list of NameNodes looks like:
{code}
O1 S1 A S2 O2
{code}
And the ORPP is currently using O1. When a retriable exception occurs, it will
scan forward until it finds another observer, in this case O2, but that means
it had to check quite a few other nodes in the meantime. We made this design
decision assuming that it should be fairly rare (i.e., only on state changes or
node failures).
So if such {{getBlockLocations}} are rare, this is fine. If they happen
commonly, it would be better that it immediately tries the active, and doesn't
move its current pointer away from O1 (since O1 is actually still a perfectly
valid observer for other requests).
In any case, I think that this optimization is not necessary for this JIRA, but
I wanted to point it out.
Regarding the v002 patch, the new test LGTM except for checkstyle issues, but
maybe the name should be made a little more informative -- something like
{{testObserverNodeSafeModeWithoutBlockLocations}} and add a comment indicating
the specific situation we are testing for. In {{FSNamesystem}}, why do you
change the test to compare {{HAState}} instead of {{HAServiceState}}? I think
the second is more standard but am curious to know if there was any reasoning
behind the change.
> Throw retriable exception for getBlockLocations when ObserverNameNode is in
> safemode
> ------------------------------------------------------------------------------------
>
> Key: HDFS-13898
> URL: https://issues.apache.org/jira/browse/HDFS-13898
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Reporter: Chao Sun
> Assignee: Chao Sun
> Priority: Major
> Attachments: HDFS-13898-HDFS-12943.000.patch,
> HDFS-13898-HDFS-12943.001.patch, HDFS-13898-HDFS-12943.002.patch
>
>
> When ObserverNameNode is in safe mode, {{getBlockLocations}} may throw safe
> mode exception if the given file doesn't have any block yet.
> {code}
> try {
> checkOperation(OperationCategory.READ);
> res = FSDirStatAndListingOp.getBlockLocations(
> dir, pc, srcArg, offset, length, true);
> if (isInSafeMode()) {
> for (LocatedBlock b : res.blocks.getLocatedBlocks()) {
> // if safemode & no block locations yet then throw safemodeException
> if ((b.getLocations() == null) || (b.getLocations().length == 0)) {
> SafeModeException se = newSafemodeException(
> "Zero blocklocations for " + srcArg);
> if (haEnabled && haContext != null &&
> haContext.getState().getServiceState() ==
> HAServiceState.ACTIVE) {
> throw new RetriableException(se);
> } else {
> throw se;
> }
> }
> }
> }
> {code}
> It only throws {{RetriableException}} for active NN so requests on observer
> may just fail.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]