[ 
https://issues.apache.org/jira/browse/HDFS-10467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15332455#comment-15332455
 ] 

Inigo Goiri commented on HDFS-10467:
------------------------------------

[~zhz], thanks for the feedback. Some clarifications to your comments:
# We considered modifying ViewFs to check a remote and centralized mount table 
(I think this option is pretty much what you propose, right?). We didn't go 
this route for a couple reasons: (1) modifications to the client, and (2) 
challenging rebalance. In addition, with our approach we get some side 
advantages like a unified view of the federation, a Router to isolate the 
NameNodes from the clients, and better HA management.
# We haven't gone into hard-linking of DNs but that could be an improvement to 
the DistCp approach. We are open to improvements there but it might imply 
changes to the NNs.
# Our current implementation of the Subcluster Rebalancer is a tool similar to 
the regular Rebalancer and is also manually triggered. Right now, it's in a 
separate package, I can post a patch just for it. Our ultimate goal is to have 
some service that monitors the subclusters and triggers the proper Subcluster 
Rebalancer operation (this is work in progress).
# In our environment, we co-locate with other services (related to YARN-5215). 
I think this is orthogonal to the rebalancing but we can always go into that.
# The rebalancing itself is the most open part at this point. We've been 
targeting a tool that supports as many options as possible and let's the admin 
decide. For now, we support both locking and not locking.
# At some point we considered NN level locking. Actually, [~jira.shegalov] had 
a couple proposals for this based on permissions. We can refine this over time 
and maybe even implement locking at NN level.
# Regading the rebalancing protocol, as I said we are targetting to make it as 
broad as possible and allow the amdin to pick their options.
  * I think it'd be better to support rebalancing of different subtrees at the 
same time. Only rebalancing within a subtree that is under rebalancing would be 
disallowed. We can always add options for that.
  * Again this is an option we added based on internal feedback, the Subcluster 
Rebalancer has an option to wait or not. The Router membership is in the State 
Store and it's done by the Router; this is already in the PoC patch. And yes, 
the main reason to do this is the caching of the mount table. Having the Router 
membership is also useful from an administration point of view to see the whole 
status of the federation.

In general, I think we should start a separate effort for the Subcluster 
Rebalancer as it has many design choices that can be changed. Obviously we also 
need to transform this into an umbrella, right now is too big. If people is 
positive about this effort, we should start discussing ways to split the effort.

> Router-based HDFS federation
> ----------------------------
>
>                 Key: HDFS-10467
>                 URL: https://issues.apache.org/jira/browse/HDFS-10467
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: fs
>    Affects Versions: 2.7.2
>            Reporter: Inigo Goiri
>         Attachments: HDFS Router Federation.pdf, HDFS-10467.PoC.patch, 
> HDFS-Router-Federation-Prototype.patch
>
>
> Add a Router to provide a federated view of multiple HDFS clusters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to