[ 
https://issues.apache.org/jira/browse/HDFS-14272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16781979#comment-16781979
 ] 

Konstantin Shvachko commented on HDFS-14272:
--------------------------------------------

Discussed with Erik and took some time to think about it. Erik convinced me 
that current implementation does have different failover mechanisms for reads 
and writes and it does use {{failoverProxy}} as a substitute for active. So 
current patch is in line with this implementation. Let's go ahead and commit it 
if there is no other objections. My +1

Going forward (another jira) I suggest we enhance the {{performFailover()}} 
with the expected node state parameter {{HAServiceState}}, so that the 
implementation could ensure the failover resulted in a node with the desired 
state. That way we will be able to exploit the same {{RetryPolicy}} for both 
reads and writes failovers.

> [SBN read] ObserverReadProxyProvider should sync with active txnID on startup
> -----------------------------------------------------------------------------
>
>                 Key: HDFS-14272
>                 URL: https://issues.apache.org/jira/browse/HDFS-14272
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: tools
>         Environment: CDH6.1 (Hadoop 3.0.x) + Consistency Reads from Standby + 
> SSL + Kerberos + RPC encryption
>            Reporter: Wei-Chiu Chuang
>            Assignee: Erik Krogen
>            Priority: Major
>         Attachments: HDFS-14272.000.patch, HDFS-14272.001.patch, 
> HDFS-14272.002.patch
>
>
> It is typical for integration tests to create some files and then check their 
> existence. For example, like the following simple bash script:
> {code:java}
> # hdfs dfs -touchz /tmp/abc
> # hdfs dfs -ls /tmp/abc
> {code}
> The test executes HDFS bash command sequentially, but it may fail with 
> Consistent Standby Read because the -ls does not find the file.
> Analysis: the second bash command, while launched sequentially after the 
> first one, is not aware of the state id returned from the first bash command. 
> So ObserverNode wouldn't wait for the the edits to get propagated, and thus 
> fails.
> I've got a cluster where the Observer has tens of seconds of RPC latency, and 
> this becomes very annoying. (I am still trying to figure out why this 
> Observer has such a long RPC latency. But that's another story.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to