[
https://issues.apache.org/jira/browse/HDFS-14272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16781979#comment-16781979
]
Konstantin Shvachko commented on HDFS-14272:
--------------------------------------------
Discussed with Erik and took some time to think about it. Erik convinced me
that current implementation does have different failover mechanisms for reads
and writes and it does use {{failoverProxy}} as a substitute for active. So
current patch is in line with this implementation. Let's go ahead and commit it
if there is no other objections. My +1
Going forward (another jira) I suggest we enhance the {{performFailover()}}
with the expected node state parameter {{HAServiceState}}, so that the
implementation could ensure the failover resulted in a node with the desired
state. That way we will be able to exploit the same {{RetryPolicy}} for both
reads and writes failovers.
> [SBN read] ObserverReadProxyProvider should sync with active txnID on startup
> -----------------------------------------------------------------------------
>
> Key: HDFS-14272
> URL: https://issues.apache.org/jira/browse/HDFS-14272
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: tools
> Environment: CDH6.1 (Hadoop 3.0.x) + Consistency Reads from Standby +
> SSL + Kerberos + RPC encryption
> Reporter: Wei-Chiu Chuang
> Assignee: Erik Krogen
> Priority: Major
> Attachments: HDFS-14272.000.patch, HDFS-14272.001.patch,
> HDFS-14272.002.patch
>
>
> It is typical for integration tests to create some files and then check their
> existence. For example, like the following simple bash script:
> {code:java}
> # hdfs dfs -touchz /tmp/abc
> # hdfs dfs -ls /tmp/abc
> {code}
> The test executes HDFS bash command sequentially, but it may fail with
> Consistent Standby Read because the -ls does not find the file.
> Analysis: the second bash command, while launched sequentially after the
> first one, is not aware of the state id returned from the first bash command.
> So ObserverNode wouldn't wait for the the edits to get propagated, and thus
> fails.
> I've got a cluster where the Observer has tens of seconds of RPC latency, and
> this becomes very annoying. (I am still trying to figure out why this
> Observer has such a long RPC latency. But that's another story.)
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]