[ https://issues.apache.org/jira/browse/HDFS-13687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16515986#comment-16515986 ]
Erik Krogen edited comment on HDFS-13687 at 6/18/18 4:30 PM: ------------------------------------------------------------- Hey [~csun], thanks for working on this valuable change. I have a few comments: * Minor nit, I feel "get-service-state.timeout.millis" would be better for the config key. But if you prefer the current version I am fine with it. I am also wondering if "ms" may be better than "millis" - is there any consistency in other configs? * The comment for {{incrementProxyIndex}} has some grammatical issues, it should read "Keep trying until all other NameNodes have been exhausted. When that happens, let the retry policy decide how to sleep and retry." * This still leaves us vulnerable to the situation when we are already locked on an active, and then it becomes standby. I don't think there's any good way for us to solve that here. * Your current strategy of cycling through the NameNodes once means that, if all NameNodes are currently in STANDBY, you will leave {{currentProxyIndex}} as the same node that initially triggered the failover. So despite this new logic, we may still fail over to a STANDBY (the same one we started with) and start reading from it. Given that the bullet above means we can't guarantee we're reading from an active anyway, I think this is probably okay, but want to mention it, and we should probably document this in the Javadoc. We just need to be aware that even with this patch, avoiding reading from a standby is best effort, not guaranteed. * The comment on {{isState}} should probably be converted to a Javadoc * I don't think the {{LOG}} in {{isState}} should be an error. For example, if one NameNode is currently unavailable (e.g. restarting), that is not really an error. It should probably be a warn. was (Author: xkrogen): Hey [~csun], thanks for working on this valuable change. I have a few comments: * Minor nit, I feel "get-service-state.timeout.millis" would be better for the config key. But if you prefer the current version I am fine with it. I am also wondering if "ms" may be better than "millis" - is there any consistency in other configs? * The comment for {{incrementProxyIndex}} has some grammatical issues, it should read "Keep trying until all other NameNodes have been exhausted. When that happens, let the retry policy decide how to sleep and retry." * This still leaves us vulnerable to the situation when we are already locked on an active, and then it becomes standby. I don't think there's any good way for us to solve that here. * Your current strategy of cycling through the NameNodes once means that, if all NameNodes are currently in STANDBY, you will leave {{currentProxyIndex}} as the same node that initially triggered the failover. So despite this new logic, we may still fail over to a STANDBY (the same one we started with) and start reading from it. Given that the bullet above means we can't guarantee we're reading from an active anyway, I think this is probably okay, but want to mention it. We just need to be aware that even with this patch, avoiding reading from a standby is best effort, not guaranteed. * I don't think the {{LOG}} in {{isState}} should be an error. For example, if one NameNode is currently unavailable (e.g. restarting), that is not really an error. It should probably be a warn. > ConfiguredFailoverProxyProvider could direct requests to SBN > ------------------------------------------------------------ > > Key: HDFS-13687 > URL: https://issues.apache.org/jira/browse/HDFS-13687 > Project: Hadoop HDFS > Issue Type: Bug > Reporter: Chao Sun > Assignee: Chao Sun > Priority: Minor > Attachments: HDFS-13687.000.patch > > > In case there are multiple SBNs, and {{dfs.ha.allow.stale.reads}} is set to > true, failover could go to a SBN which then may serve read requests from > client. This may not be the expected behavior. This issue arises when we are > working on HDFS-12943 and HDFS-12976. > A better approach for this could be to check {{HAServiceState}} and find out > the active NN when performing failover. This also can reduce the # of > failovers the client has to do in case of multiple SBNs. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org