[ 
https://issues.apache.org/jira/browse/HDFS-13522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17571458#comment-17571458
 ] 

ZanderXu commented on HDFS-13522:
---------------------------------

[~zhengchenyu][~simbadzina] Thanks for your good job. 

Maybe first we need to achieve agreement that which design we plan to use. Then 
move on the detailed implementation.

bq. Design A : Propagate the last seen state ID for all namespaces to the 
client in rpc responses, andreceive it from the client in requests.
bq. Design B:  Before a router sends each read to the Observer, it fetches the 
last seen state ID from the corresponding active namenode. 

In our prod environment, we used design B. There are serval main considerations 
for reference:
* Design A is expensive with a large rpc header.
* In most scenarios, OBserverRead need to solve the problem of async client 
reading, that is, client need to actively msync and store all NS's stated Id in 
Client. But RBF maybe just use one NS's stateId when handling one requests, all 
others is useless. And with more and more downstream NSs, this problem will 
become more and more obvious. We have about 100+ downstream NSs, so we dropped 
the design A. 
* Design B is transparent and insensitive to the client, and it is relatively 
easy to release. And it's very controllable for admin, such as we can enable or 
disable any NS using ObserverRead.

But Design B are also some issues that need to be resolved. 
* We supports a new RPC Server in NameNode to handle msync separately, to 
reduce the latency of msync rpc.
* We just call msyn rpc periodically in RBF, such as 100ms.
* If some end users wants strong consistency, our client supports carrying sync 
flags at connection level or rpc level, such as CallerContext. 
* When RBF forwarding requests from client, It will combine the sync flag in 
the request and the msync period in RBF to determine whether msync is required.
* We also do some changes in Rpc Handler when the server's txId is smaller then 
router in Observer NameNode.

In our prod environment, only very very few business processes require storage 
consistency.

As above, after comprehensive consideration, we decided to use Design B.

Thanks [~simbadzina][~zhengchenyu] for your good job again. I strongly 
recommend that we need to decide which design we will to use first.

If anyone has any good ideas, please share them. This is a very helpful feature 
for end users, so I hope we can push it forward with high priority.
cc [~hexiaoqiao][~elgoiri][~ayushtkn] [~ferhui]

> RBF: Support observer node from Router-Based Federation
> -------------------------------------------------------
>
>                 Key: HDFS-13522
>                 URL: https://issues.apache.org/jira/browse/HDFS-13522
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: federation, namenode
>            Reporter: Erik Krogen
>            Assignee: Simbarashe Dzinamarira
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: HDFS-13522.001.patch, HDFS-13522.002.patch, 
> HDFS-13522_WIP.patch, HDFS-13522_proposal_zhengchenyu_v1.pdf, RBF_ Observer 
> support.pdf, Router+Observer RPC clogging.png, 
> ShortTerm-Routers+Observer.png, 
> observer_reads_in_rbf_proposal_simbadzina_v1.pdf, 
> observer_reads_in_rbf_proposal_simbadzina_v2.pdf
>
>          Time Spent: 20h
>  Remaining Estimate: 0h
>
> Changes will need to occur to the router to support the new observer node.
> One such change will be to make the router understand the observer state, 
> e.g. {{FederationNamenodeServiceState}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to