[ 
https://issues.apache.org/jira/browse/HDFS-13898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16622334#comment-16622334
 ] 

Chao Sun commented on HDFS-13898:
---------------------------------

Thanks [~xkrogen] for the explanation. I like the idea of having some 
particular exception to trigger ORPP to directly go to active - this could be 
useful for cases like HDFS-13924. For this particular scenario (observer in 
safemode) though, I think it's fine since I assume normally safemode only 
happens when the observer is starting up, which should not be quite common. 
Also, the safemode could last for quite a while and in my experience the chance 
of RPCs hitting this error is quite high, so might better to have all clients 
to re-direct to a different observer anyway.

Regarding the v002 patch, how about change the test name to 
{{testObserverNodeSafeModeWithBlockLocations}}? 
{{testObserverNodeSafeModeWithoutBlockLocations}} seems a little confusing to 
me since we are testing the safe mode case with {{getBlockLocations}} calls. 
About the {{HAState}} change, no particular reason except I wanted to make the 
lines shorter :) I'm perfectly fine to change it back.

 

Will fix the style issues too.

> Throw retriable exception for getBlockLocations when ObserverNameNode is in 
> safemode
> ------------------------------------------------------------------------------------
>
>                 Key: HDFS-13898
>                 URL: https://issues.apache.org/jira/browse/HDFS-13898
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Chao Sun
>            Assignee: Chao Sun
>            Priority: Major
>         Attachments: HDFS-13898-HDFS-12943.000.patch, 
> HDFS-13898-HDFS-12943.001.patch, HDFS-13898-HDFS-12943.002.patch
>
>
> When ObserverNameNode is in safe mode, {{getBlockLocations}} may throw safe 
> mode exception if the given file doesn't have any block yet. 
> {code}
>     try {
>       checkOperation(OperationCategory.READ);
>       res = FSDirStatAndListingOp.getBlockLocations(
>           dir, pc, srcArg, offset, length, true);
>       if (isInSafeMode()) {
>         for (LocatedBlock b : res.blocks.getLocatedBlocks()) {
>           // if safemode & no block locations yet then throw safemodeException
>           if ((b.getLocations() == null) || (b.getLocations().length == 0)) {
>             SafeModeException se = newSafemodeException(
>                 "Zero blocklocations for " + srcArg);
>             if (haEnabled && haContext != null &&
>                 haContext.getState().getServiceState() == 
> HAServiceState.ACTIVE) {
>               throw new RetriableException(se);
>             } else {
>               throw se;
>             }
>           }
>         }
>       }
> {code}
> It only throws {{RetriableException}} for active NN so requests on observer 
> may just fail.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to