[ 
https://issues.apache.org/jira/browse/HDFS-13687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16515986#comment-16515986
 ] 

Erik Krogen commented on HDFS-13687:
------------------------------------

Hey [~csun], thanks for working on this valuable change. I have a few comments:
* Minor nit, I feel "get-service-state.timeout.millis" would be better for the 
config key. But if you prefer the current version I am fine with it. I am also 
wondering if "ms" may be better than "millis" - is there any consistency in 
other configs?
* The comment for {{incrementProxyIndex}} has some grammatical issues, it 
should read "Keep trying until all other NameNodes have been exhausted. When 
that happens, let the retry policy decide how to sleep and retry."
* This still leaves us vulnerable to the situation when we are already locked 
on an active, and then it becomes standby. I don't think there's any good way 
for us to solve that here.
* Your current strategy of cycling through the NameNodes once means that, if 
all NameNodes are currently in STANDBY, you will leave {{currentProxyIndex}} as 
the same node that initially triggered the failover. So despite this new logic, 
we may still fail over to a STANDBY (the same one we started with) and start 
reading from it. Given that the bullet above means we can't guarantee we're 
reading from an active anyway, I think this is probably okay, but want to 
mention it. We just need to be aware that even with this patch, avoiding 
reading from a standby is best effort, not guaranteed.
* I don't think the {{LOG}} in {{isState}} should be an error. For example, if 
one NameNode is currently unavailable (e.g. restarting), that is not really an 
error. It should probably be a warn.

> ConfiguredFailoverProxyProvider could direct requests to SBN
> ------------------------------------------------------------
>
>                 Key: HDFS-13687
>                 URL: https://issues.apache.org/jira/browse/HDFS-13687
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Chao Sun
>            Assignee: Chao Sun
>            Priority: Minor
>         Attachments: HDFS-13687.000.patch
>
>
> In case there are multiple SBNs, and {{dfs.ha.allow.stale.reads}} is set to 
> true, failover could go to a SBN which then may serve read requests from 
> client. This may not be the expected behavior. This issue arises when we are 
> working on HDFS-12943 and HDFS-12976.
> A better approach for this could be to check {{HAServiceState}} and find out 
> the active NN when performing failover. This also can reduce the # of 
> failovers the client has to do in case of multiple SBNs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to