[jira] [Commented] (HDFS-14272) [SBN read] ObserverReadProxyProvider should sync with active txnID on startup

Konstantin Shvachko (JIRA) Tue, 26 Feb 2019 18:48:29 -0800


    [ 
https://issues.apache.org/jira/browse/HDFS-14272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16778808#comment-16778808
 ]


Konstantin Shvachko commented on HDFS-14272:
--------------------------------------------

I see now that you use {{failoverProxy}} as a substitute for active NN, which 
brings us to my second concern in the comment above - the "side note", which 
seems to be the main issue.
 I argue that {{failoverProxy}} should be used only in {{performFailover()}}, 
and not as a substitute for ANN anywhere else. In your approach both writes and 
{{msync()}} are called on {{failoverProxy}}, which triggers failover in the 
underlying proxy CFPP or IPFPP. I think ORPP should enforce its autonomous 
knowledge of the states of NameNodes rather than mixing it with the states 
known to underlying proxies. If ORPP knows thatr n1 is active it should call a 
write or an {{msync()}} on n1. If n1 is not active anymore, then it will thrown 
an exception, which trigger failover.
 Your approach mixes two failovers: the underlying proxy failover is used for 
writes, but for reads it uses ORPP failover. This doesn't seem consistent to me.

> [SBN read] ObserverReadProxyProvider should sync with active txnID on startup
> -----------------------------------------------------------------------------
>
>                 Key: HDFS-14272
>                 URL: https://issues.apache.org/jira/browse/HDFS-14272
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: tools
>         Environment: CDH6.1 (Hadoop 3.0.x) + Consistency Reads from Standby + 
> SSL + Kerberos + RPC encryption
>            Reporter: Wei-Chiu Chuang
>            Assignee: Erik Krogen
>            Priority: Major
>         Attachments: HDFS-14272.000.patch, HDFS-14272.001.patch, 
> HDFS-14272.002.patch
>
>
> It is typical for integration tests to create some files and then check their 
> existence. For example, like the following simple bash script:
> {code:java}
> # hdfs dfs -touchz /tmp/abc
> # hdfs dfs -ls /tmp/abc
> {code}
> The test executes HDFS bash command sequentially, but it may fail with 
> Consistent Standby Read because the -ls does not find the file.
> Analysis: the second bash command, while launched sequentially after the 
> first one, is not aware of the state id returned from the first bash command. 
> So ObserverNode wouldn't wait for the the edits to get propagated, and thus 
> fails.
> I've got a cluster where the Observer has tens of seconds of RPC latency, and 
> this becomes very annoying. (I am still trying to figure out why this 
> Observer has such a long RPC latency. But that's another story.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDFS-14272) [SBN read] ObserverReadProxyProvider should sync with active txnID on startup

Reply via email to