[ 
https://issues.apache.org/jira/browse/HDFS-13898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16622255#comment-16622255
 ] 

Erik Krogen commented on HDFS-13898:
------------------------------------

{quote}
Why it has to scan all of the observers - wouldn't it just go to the next 
available observer?
{quote}

This comes from the fact that the list of NameNodes the ORPP scans contains all 
NameNodes. So let's assume we have 5 NameNodes: 1 Active, 2 Standby, 2 
Observer. The list of NameNodes looks like:
{code}
O1 S1 A S2 O2
{code}
And the ORPP is currently using O1. When a retriable exception occurs, it will 
scan forward until it finds another observer, in this case O2, but that means 
it had to check quite a few other nodes in the meantime. We made this design 
decision assuming that it should be fairly rare (i.e., only on state changes or 
node failures).

So if such {{getBlockLocations}} are rare, this is fine. If they happen 
commonly, it would be better that it immediately tries the active, and doesn't 
move its current pointer away from O1 (since O1 is actually still a perfectly 
valid observer for other requests).

In any case, I think that this optimization is not necessary for this JIRA, but 
I wanted to point it out.

Regarding the v002 patch, the new test LGTM except for checkstyle issues, but 
maybe the name should be made a little more informative -- something like 
{{testObserverNodeSafeModeWithoutBlockLocations}} and add a comment indicating 
the specific situation we are testing for. In {{FSNamesystem}}, why do you 
change the test to compare {{HAState}} instead of {{HAServiceState}}? I think 
the second is more standard but am curious to know if there was any reasoning 
behind the change.

> Throw retriable exception for getBlockLocations when ObserverNameNode is in 
> safemode
> ------------------------------------------------------------------------------------
>
>                 Key: HDFS-13898
>                 URL: https://issues.apache.org/jira/browse/HDFS-13898
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Chao Sun
>            Assignee: Chao Sun
>            Priority: Major
>         Attachments: HDFS-13898-HDFS-12943.000.patch, 
> HDFS-13898-HDFS-12943.001.patch, HDFS-13898-HDFS-12943.002.patch
>
>
> When ObserverNameNode is in safe mode, {{getBlockLocations}} may throw safe 
> mode exception if the given file doesn't have any block yet. 
> {code}
>     try {
>       checkOperation(OperationCategory.READ);
>       res = FSDirStatAndListingOp.getBlockLocations(
>           dir, pc, srcArg, offset, length, true);
>       if (isInSafeMode()) {
>         for (LocatedBlock b : res.blocks.getLocatedBlocks()) {
>           // if safemode & no block locations yet then throw safemodeException
>           if ((b.getLocations() == null) || (b.getLocations().length == 0)) {
>             SafeModeException se = newSafemodeException(
>                 "Zero blocklocations for " + srcArg);
>             if (haEnabled && haContext != null &&
>                 haContext.getState().getServiceState() == 
> HAServiceState.ACTIVE) {
>               throw new RetriableException(se);
>             } else {
>               throw se;
>             }
>           }
>         }
>       }
> {code}
> It only throws {{RetriableException}} for active NN so requests on observer 
> may just fail.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to