[
https://issues.apache.org/jira/browse/HDFS-13687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16515986#comment-16515986
]
Erik Krogen commented on HDFS-13687:
------------------------------------
Hey [~csun], thanks for working on this valuable change. I have a few comments:
* Minor nit, I feel "get-service-state.timeout.millis" would be better for the
config key. But if you prefer the current version I am fine with it. I am also
wondering if "ms" may be better than "millis" - is there any consistency in
other configs?
* The comment for {{incrementProxyIndex}} has some grammatical issues, it
should read "Keep trying until all other NameNodes have been exhausted. When
that happens, let the retry policy decide how to sleep and retry."
* This still leaves us vulnerable to the situation when we are already locked
on an active, and then it becomes standby. I don't think there's any good way
for us to solve that here.
* Your current strategy of cycling through the NameNodes once means that, if
all NameNodes are currently in STANDBY, you will leave {{currentProxyIndex}} as
the same node that initially triggered the failover. So despite this new logic,
we may still fail over to a STANDBY (the same one we started with) and start
reading from it. Given that the bullet above means we can't guarantee we're
reading from an active anyway, I think this is probably okay, but want to
mention it. We just need to be aware that even with this patch, avoiding
reading from a standby is best effort, not guaranteed.
* I don't think the {{LOG}} in {{isState}} should be an error. For example, if
one NameNode is currently unavailable (e.g. restarting), that is not really an
error. It should probably be a warn.
> ConfiguredFailoverProxyProvider could direct requests to SBN
> ------------------------------------------------------------
>
> Key: HDFS-13687
> URL: https://issues.apache.org/jira/browse/HDFS-13687
> Project: Hadoop HDFS
> Issue Type: Bug
> Reporter: Chao Sun
> Assignee: Chao Sun
> Priority: Minor
> Attachments: HDFS-13687.000.patch
>
>
> In case there are multiple SBNs, and {{dfs.ha.allow.stale.reads}} is set to
> true, failover could go to a SBN which then may serve read requests from
> client. This may not be the expected behavior. This issue arises when we are
> working on HDFS-12943 and HDFS-12976.
> A better approach for this could be to check {{HAServiceState}} and find out
> the active NN when performing failover. This also can reduce the # of
> failovers the client has to do in case of multiple SBNs.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]