[
https://issues.apache.org/jira/browse/HDFS-13522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17571458#comment-17571458
]
ZanderXu commented on HDFS-13522:
---------------------------------
[~zhengchenyu][~simbadzina] Thanks for your good job.
Maybe first we need to achieve agreement that which design we plan to use. Then
move on the detailed implementation.
bq. Design A : Propagate the last seen state ID for all namespaces to the
client in rpc responses, andreceive it from the client in requests.
bq. Design B: Before a router sends each read to the Observer, it fetches the
last seen state ID from the corresponding active namenode.
In our prod environment, we used design B. There are serval main considerations
for reference:
* Design A is expensive with a large rpc header.
* In most scenarios, OBserverRead need to solve the problem of async client
reading, that is, client need to actively msync and store all NS's stated Id in
Client. But RBF maybe just use one NS's stateId when handling one requests, all
others is useless. And with more and more downstream NSs, this problem will
become more and more obvious. We have about 100+ downstream NSs, so we dropped
the design A.
* Design B is transparent and insensitive to the client, and it is relatively
easy to release. And it's very controllable for admin, such as we can enable or
disable any NS using ObserverRead.
But Design B are also some issues that need to be resolved.
* We supports a new RPC Server in NameNode to handle msync separately, to
reduce the latency of msync rpc.
* We just call msyn rpc periodically in RBF, such as 100ms.
* If some end users wants strong consistency, our client supports carrying sync
flags at connection level or rpc level, such as CallerContext.
* When RBF forwarding requests from client, It will combine the sync flag in
the request and the msync period in RBF to determine whether msync is required.
* We also do some changes in Rpc Handler when the server's txId is smaller then
router in Observer NameNode.
In our prod environment, only very very few business processes require storage
consistency.
As above, after comprehensive consideration, we decided to use Design B.
Thanks [~simbadzina][~zhengchenyu] for your good job again. I strongly
recommend that we need to decide which design we will to use first.
If anyone has any good ideas, please share them. This is a very helpful feature
for end users, so I hope we can push it forward with high priority.
cc [~hexiaoqiao][~elgoiri][~ayushtkn] [~ferhui]
> RBF: Support observer node from Router-Based Federation
> -------------------------------------------------------
>
> Key: HDFS-13522
> URL: https://issues.apache.org/jira/browse/HDFS-13522
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: federation, namenode
> Reporter: Erik Krogen
> Assignee: Simbarashe Dzinamarira
> Priority: Major
> Labels: pull-request-available
> Attachments: HDFS-13522.001.patch, HDFS-13522.002.patch,
> HDFS-13522_WIP.patch, HDFS-13522_proposal_zhengchenyu_v1.pdf, RBF_ Observer
> support.pdf, Router+Observer RPC clogging.png,
> ShortTerm-Routers+Observer.png,
> observer_reads_in_rbf_proposal_simbadzina_v1.pdf,
> observer_reads_in_rbf_proposal_simbadzina_v2.pdf
>
> Time Spent: 20h
> Remaining Estimate: 0h
>
> Changes will need to occur to the router to support the new observer node.
> One such change will be to make the router understand the observer state,
> e.g. {{FederationNamenodeServiceState}}.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]