[ 
https://issues.apache.org/jira/browse/HDFS-6261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13982196#comment-13982196
 ] 

Junping Du commented on HDFS-6261:
----------------------------------

Thanks [~decster] for working on this effort.
I quickly go through it with a few comments below (not completed):
bq. The original Hadoop topology supports a 3-layer topology looks like 
following:
I think it is better to say: previously, Hadoop only support 2-layers topology: 
rack and host. Mentioning datacenter layer will confuse user as it is never 
worked even now. For the same reason, we should mention now we support 3 layers 
topology/locality: rack, nodegroup and host.

bq. This network topology is designed and work well for Hadoop cluster running  
on physical server farms. However, for Hadoop running on virtualized platform, 
we have additional "hypervisor" layer, and its characteristics include:...
I think the use case of NodeGroup layer is even broader than virtualization and 
suitable for any sub-dependency of nodes between rack and host layer. So, it 
could be better to say something like "This network topology is designed to 
work well on Hadoop cluster that only has rack (switch or power) failure 
dependency among nodes. However, for other cases, like: Hadoop nodes running on 
virtualized platform, we have additional "hypervisor" layer, and its 
characteristics include ..."

bq. Due to above characteristics in performance and reliability, this layer is 
not transparent for Hadoop...
Reliability is more important here, so here better to be "Due to above 
characteristics in reliability and performance, this layer should't be 
transparent for Hadoop..."

bq. 1st replica is on the local node or local node group of the writer
For more precisely, we may say something like: "1st replica is placed on the 
nearest node to writer in topology. In most cases, it should be on the same 
node of writer, but could be on other node in the same nodegroup or rack if 
node of writer is not qualified (i.e. no local datanode or disk is full) to 
place replica." 

The diagram is better to omit "datacenter" layer according to comments above 
and red layer of "S1" is better update to "NG1" for reflecting NodeGroup layer.

> Add document for enabling node group layer in HDFS
> --------------------------------------------------
>
>                 Key: HDFS-6261
>                 URL: https://issues.apache.org/jira/browse/HDFS-6261
>             Project: Hadoop HDFS
>          Issue Type: Task
>          Components: documentation
>            Reporter: Wenwu Peng
>            Assignee: Binglin Chang
>              Labels: documentation
>         Attachments: 3layer-topology.png, 4layer-topology.png, 
> HDFS-6261.v1.patch, HDFS-6261.v1.patch
>
>
> Most of patches from Umbrella JIRA HADOOP-8468  have committed, However there 
> is no site to introduce NodeGroup-aware(HADOOP Virtualization Extensisons) 
> and how to do configuration. so we need to doc it.
> 1.  Doc NodeGroup-aware relate in http://hadoop.apache.org/docs/current 
> 2.  Doc NodeGroup-aware properties in core-default.xml.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to