[jira] [Commented] (HDFS-14272) [SBN read] ObserverReadProxyProvider should sync with active txnID on startup

Erik Krogen (JIRA) Thu, 21 Feb 2019 08:28:23 -0800


    [ 
https://issues.apache.org/jira/browse/HDFS-14272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16774265#comment-16774265
 ]


Erik Krogen commented on HDFS-14272:
------------------------------------

{quote}
msync() on active NN if it is a new client. But it may not be possible, since 
new client does not know yet where the active node is.
{quote}
If {{msync()}} is called on an observer or standby NN, the node will reject the 
request. Thus calling {{msync()}} on the {{failoverProxy}} is guaranteed to 
target the active, properly syncing the state. In general, we cannot rely on 
the cached state being correct, since the NNs can swap states at any time. 
IIUC, this is why existing proxy providers rely on the server to enforce 
whether or not it can serve a request; only the servers know definitely whether 
or not they are an active.

{quote}
On a side note we should also fix the right path in 
ObserverReadInvocationHandler.invole(). It currently uses failoverProxy as an 
ActiveNN, which may not work if observerReadEnabled == false. We should find 
the active in the cache and use it for writes.
{quote}
I don't understand your comment about {{observerReadEnabled}}. Writes are not 
affected at all by this flag. The only effect of the flag is to skip attempting 
to read from observers; with it disabled, reads and writes both will go to 
{{failoverProxy}} immediately.

> [SBN read] ObserverReadProxyProvider should sync with active txnID on startup
> -----------------------------------------------------------------------------
>
>                 Key: HDFS-14272
>                 URL: https://issues.apache.org/jira/browse/HDFS-14272
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: tools
>         Environment: CDH6.1 (Hadoop 3.0.x) + Consistency Reads from Standby + 
> SSL + Kerberos + RPC encryption
>            Reporter: Wei-Chiu Chuang
>            Assignee: Erik Krogen
>            Priority: Major
>         Attachments: HDFS-14272.000.patch, HDFS-14272.001.patch
>
>
> It is typical for integration tests to create some files and then check their 
> existence. For example, like the following simple bash script:
> {code:java}
> # hdfs dfs -touchz /tmp/abc
> # hdfs dfs -ls /tmp/abc
> {code}
> The test executes HDFS bash command sequentially, but it may fail with 
> Consistent Standby Read because the -ls does not find the file.
> Analysis: the second bash command, while launched sequentially after the 
> first one, is not aware of the state id returned from the first bash command. 
> So ObserverNode wouldn't wait for the the edits to get propagated, and thus 
> fails.
> I've got a cluster where the Observer has tens of seconds of RPC latency, and 
> this becomes very annoying. (I am still trying to figure out why this 
> Observer has such a long RPC latency. But that's another story.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDFS-14272) [SBN read] ObserverReadProxyProvider should sync with active txnID on startup

Reply via email to