[
https://issues.apache.org/jira/browse/HDFS-14272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16773554#comment-16773554
]
Konstantin Shvachko commented on HDFS-14272:
--------------------------------------------
Yes I also thought the new client would always syncs up with active. I see it
is not the case.
Erik, it looks like your aproach is to call {{msync()}} on active NN if it is a
new client. But it may not be possible, since new client does not know yet
where the active node is. Current code in {{changeProxy()}} changes to the next
proxy and calls {{getHAServiceState()}} for it. This works fine when we already
know (cache) the states of all NNs, but for new clients we don't. I propose to
go over all NNs and collect {{getHAServiceState()}} for all of them for a new
client, that is when a clients tries to make the first RPC call (HDFS-13779,
includes HDFS-13780).
On a side note we should also fix the right path in
{{ObserverReadInvocationHandler.invole()}}. It currently uses {{failoverProxy}}
as an ActiveNN, which may not work if {{observerReadEnabled == false}}. We
should find the active in the cache and use it for writes.
> [SBN read] ObserverReadProxyProvider should sync with active txnID on startup
> -----------------------------------------------------------------------------
>
> Key: HDFS-14272
> URL: https://issues.apache.org/jira/browse/HDFS-14272
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: tools
> Environment: CDH6.1 (Hadoop 3.0.x) + Consistency Reads from Standby +
> SSL + Kerberos + RPC encryption
> Reporter: Wei-Chiu Chuang
> Assignee: Erik Krogen
> Priority: Major
> Attachments: HDFS-14272.000.patch, HDFS-14272.001.patch
>
>
> It is typical for integration tests to create some files and then check their
> existence. For example, like the following simple bash script:
> {code:java}
> # hdfs dfs -touchz /tmp/abc
> # hdfs dfs -ls /tmp/abc
> {code}
> The test executes HDFS bash command sequentially, but it may fail with
> Consistent Standby Read because the -ls does not find the file.
> Analysis: the second bash command, while launched sequentially after the
> first one, is not aware of the state id returned from the first bash command.
> So ObserverNode wouldn't wait for the the edits to get propagated, and thus
> fails.
> I've got a cluster where the Observer has tens of seconds of RPC latency, and
> this becomes very annoying. (I am still trying to figure out why this
> Observer has such a long RPC latency. But that's another story.)
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]