[ 
https://issues.apache.org/jira/browse/HDFS-13687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16515986#comment-16515986
 ] 

Erik Krogen edited comment on HDFS-13687 at 6/18/18 4:30 PM:
-------------------------------------------------------------

Hey [~csun], thanks for working on this valuable change. I have a few comments:
* Minor nit, I feel "get-service-state.timeout.millis" would be better for the 
config key. But if you prefer the current version I am fine with it. I am also 
wondering if "ms" may be better than "millis" - is there any consistency in 
other configs?
* The comment for {{incrementProxyIndex}} has some grammatical issues, it 
should read "Keep trying until all other NameNodes have been exhausted. When 
that happens, let the retry policy decide how to sleep and retry."
* This still leaves us vulnerable to the situation when we are already locked 
on an active, and then it becomes standby. I don't think there's any good way 
for us to solve that here.
* Your current strategy of cycling through the NameNodes once means that, if 
all NameNodes are currently in STANDBY, you will leave {{currentProxyIndex}} as 
the same node that initially triggered the failover. So despite this new logic, 
we may still fail over to a STANDBY (the same one we started with) and start 
reading from it. Given that the bullet above means we can't guarantee we're 
reading from an active anyway, I think this is probably okay, but want to 
mention it, and we should probably document this in the Javadoc. We just need 
to be aware that even with this patch, avoiding reading from a standby is best 
effort, not guaranteed.
* The comment on {{isState}} should probably be converted to a Javadoc
* I don't think the {{LOG}} in {{isState}} should be an error. For example, if 
one NameNode is currently unavailable (e.g. restarting), that is not really an 
error. It should probably be a warn.


was (Author: xkrogen):
Hey [~csun], thanks for working on this valuable change. I have a few comments:
* Minor nit, I feel "get-service-state.timeout.millis" would be better for the 
config key. But if you prefer the current version I am fine with it. I am also 
wondering if "ms" may be better than "millis" - is there any consistency in 
other configs?
* The comment for {{incrementProxyIndex}} has some grammatical issues, it 
should read "Keep trying until all other NameNodes have been exhausted. When 
that happens, let the retry policy decide how to sleep and retry."
* This still leaves us vulnerable to the situation when we are already locked 
on an active, and then it becomes standby. I don't think there's any good way 
for us to solve that here.
* Your current strategy of cycling through the NameNodes once means that, if 
all NameNodes are currently in STANDBY, you will leave {{currentProxyIndex}} as 
the same node that initially triggered the failover. So despite this new logic, 
we may still fail over to a STANDBY (the same one we started with) and start 
reading from it. Given that the bullet above means we can't guarantee we're 
reading from an active anyway, I think this is probably okay, but want to 
mention it. We just need to be aware that even with this patch, avoiding 
reading from a standby is best effort, not guaranteed.
* I don't think the {{LOG}} in {{isState}} should be an error. For example, if 
one NameNode is currently unavailable (e.g. restarting), that is not really an 
error. It should probably be a warn.

> ConfiguredFailoverProxyProvider could direct requests to SBN
> ------------------------------------------------------------
>
>                 Key: HDFS-13687
>                 URL: https://issues.apache.org/jira/browse/HDFS-13687
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Chao Sun
>            Assignee: Chao Sun
>            Priority: Minor
>         Attachments: HDFS-13687.000.patch
>
>
> In case there are multiple SBNs, and {{dfs.ha.allow.stale.reads}} is set to 
> true, failover could go to a SBN which then may serve read requests from 
> client. This may not be the expected behavior. This issue arises when we are 
> working on HDFS-12943 and HDFS-12976.
> A better approach for this could be to check {{HAServiceState}} and find out 
> the active NN when performing failover. This also can reduce the # of 
> failovers the client has to do in case of multiple SBNs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to