[jira] [Commented] (HDFS-14272) [SBN read] ObserverReadProxyProvider should sync with active txnID on startup

Konstantin Shvachko (JIRA) Wed, 20 Feb 2019 17:00:03 -0800


    [ 
https://issues.apache.org/jira/browse/HDFS-14272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16773554#comment-16773554
 ]


Konstantin Shvachko commented on HDFS-14272:
--------------------------------------------

Yes I also thought the new client would always syncs up with active. I see it 
is not the case.
Erik, it looks like your aproach is to call {{msync()}} on active NN if it is a 
new client. But it may not be possible, since new client does not know yet 
where the active node is. Current code in {{changeProxy()}} changes to the next 
proxy and calls {{getHAServiceState()}} for it. This works fine when we already 
know (cache) the states of all NNs, but for new clients we don't. I propose to 
go over all NNs and collect {{getHAServiceState()}} for all of them for a new 
client, that is when a clients tries to make the first RPC call (HDFS-13779, 
includes HDFS-13780).

On a side note we should also fix the right path in 
{{ObserverReadInvocationHandler.invole()}}. It currently uses {{failoverProxy}} 
as an ActiveNN, which may not work if {{observerReadEnabled == false}}. We 
should find the active in the cache and use it for writes.

> [SBN read] ObserverReadProxyProvider should sync with active txnID on startup
> -----------------------------------------------------------------------------
>
>                 Key: HDFS-14272
>                 URL: https://issues.apache.org/jira/browse/HDFS-14272
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: tools
>         Environment: CDH6.1 (Hadoop 3.0.x) + Consistency Reads from Standby + 
> SSL + Kerberos + RPC encryption
>            Reporter: Wei-Chiu Chuang
>            Assignee: Erik Krogen
>            Priority: Major
>         Attachments: HDFS-14272.000.patch, HDFS-14272.001.patch
>
>
> It is typical for integration tests to create some files and then check their 
> existence. For example, like the following simple bash script:
> {code:java}
> # hdfs dfs -touchz /tmp/abc
> # hdfs dfs -ls /tmp/abc
> {code}
> The test executes HDFS bash command sequentially, but it may fail with 
> Consistent Standby Read because the -ls does not find the file.
> Analysis: the second bash command, while launched sequentially after the 
> first one, is not aware of the state id returned from the first bash command. 
> So ObserverNode wouldn't wait for the the edits to get propagated, and thus 
> fails.
> I've got a cluster where the Observer has tens of seconds of RPC latency, and 
> this becomes very annoying. (I am still trying to figure out why this 
> Observer has such a long RPC latency. But that's another story.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDFS-14272) [SBN read] ObserverReadProxyProvider should sync with active txnID on startup

Reply via email to