[
https://issues.apache.org/jira/browse/HDFS-13688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16545720#comment-16545720
]
Chen Liang commented on HDFS-13688:
-----------------------------------
Thanks [~linyiqun], [~zero45] for the comments! Sorry for the late response,
just got back from vacation.
On [~linyiqun]'s comment:
bq. the benefit of the second approach is that we can make client logic more
simple and don't need to hold the state id. ...
Thanks for sharing the thoughts [~linyiqun]! I'm not sure though, if there
currently is a protocol for SbN to make RPC calls to ANN. Because it seems an
overkill to add a whole new protocol just for this. And even with the second
approach, I think the client still needs to hold state id, because msync call
is to let SbN catch up on a state id given by client, not necessarily to catch
up on the most recent ANN state id. So client still needs to present a state id
for SbN to check how much it needs to catch.
On [~zero45]'s comments:
bq. client should instead learn the stateID when it eventually decides to do
something, like a read or a write.
You are right that when the fresh client does a read/write, the client will
have a state id regardless. The targeted issue here was that, if the fresh
client is making a write call, that would be fine because the client gets the
ANN state id. But if the fresh client is making a read call, which goes to
observer, there is no guarantee on what this state id will be, potentially
causing issue for the "Third-part communication" part as mentioned in the
design doc. So for a fresh client's read, we need a way for it catch to ANN
state id, and this is where msync comes in. Namely, the fresh client can make a
msync call first, then proceed to read the recent update. This is the use case
where a fresh client may start with making a msync call.
bq. Observer and SNN also provide stateIDs from reads. Is there a reason you
need the stateID from the ANN?
Same as above, msync needs to catch up the most ANN state id. Getting this from
Observer does not guarantee this.
bq. Have we considered having the txid to wait for be a parameter to msync?
Something like msync(long txidToWaitFor)?...
In the current WIP patch, in NameNode protocol, the msync does take a
parameter, which is the txid to sync on/wait for. It is DFSClient that has a
wrapper msync without this parameter, DFSClient gets the txid first for fresh
client, then pass the id to the NameNode protocol's msync. Looks removing the
wrapper logic, but expose the parameterized msync would be what you suggested?
It seems a good idea to separate the issue of how to get the right state id.
But it would make no sense to have DFSClient expose a msync API that requires a
state id, as I don't think any layer above DFSClient would/should understand
state id and be able to provide one.
As for implementing blocking wait, in the WIP patch, this happens on server
side (i.e. a blocking call from client perspective), it is currently using
deferred response feature from HADOOP-11552 along with a dedicated thread pool.
Fundamentally, this is the same as a naive blocking, only that the wait is
handed over to the thread pool to release handler threads.
> Introduce msync API call
> ------------------------
>
> Key: HDFS-13688
> URL: https://issues.apache.org/jira/browse/HDFS-13688
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Reporter: Chen Liang
> Assignee: Chen Liang
> Priority: Major
> Attachments: HDFS-13688-HDFS-12943.WIP.002.patch,
> HDFS-13688-HDFS-12943.WIP.patch
>
>
> As mentioned in the design doc in HDFS-12943, to ensure consistent read, we
> need to introduce an RPC call {{msync}}. Specifically, client can issue a
> msync call to Observer node along with a transactionID. The msync will only
> return when the Observer's transactionID has caught up to the given ID. This
> JIRA is to add this API.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]