[ 
http://issues.apache.org/jira/browse/HADOOP-692?page=comments#action_12448637 ] 
            
Konstantin Shvachko commented on HADOOP-692:
--------------------------------------------

* Node locality
May be we should build cluster topology map based on network hops as a measure 
of node locality.
Rather than specifying externally which rack each node belongs to.
The map is built at startup. It takes O(#racks * #nodes) communications to 
partition all nodes into racks.
The topology map can be persistent, which will make it easier to restart the 
cluster.
We should keep the topology map separate to let e.g. map-reduce use it.
If a client runs on one of the cluster machines then it should know its rack 
both for reads and writes.
If not, then it is not applicable.
I like hierarchical topology map, like datacenters (2-3 hops). What is the 
distance between datacenters in hops?
But probably not too deep. The rest may be just considered relatively far away 
from each other.

* Replica placement
Is it far to say that the replica placement strategy is
- place first replica locally if possible (client runs from the cluster 
machine), if not - on an arbitrary node.
- place second replica on a different node on the same rack as the first node.
- place third replica on a different rack but in the same datacenter as the 
second node.
If we don't have further hierarchical level then
- place 4-th replica and all subsequent ones randomly on any node not yet 
selected.

* Different placement strategies
I think we should define an interface responsible for block placement
interface BlockReplicator {  // just an example
    chooseTargets();
    getTopology();
    distance(d1,d2);
    ............
}
So that we could implement different replication strategies and use them 
interchangeably
if not in the runtime then at least in compile time.

> Rack-aware Replica Placement
> ----------------------------
>
>                 Key: HADOOP-692
>                 URL: http://issues.apache.org/jira/browse/HADOOP-692
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.8.0
>            Reporter: Hairong Kuang
>         Assigned To: Hairong Kuang
>             Fix For: 0.9.0
>
>
> This issue assumes that HDFS runs on a cluster of computers that spread 
> across many racks. Communication between two nodes on different racks needs 
> to go through switches. Bandwidth in/out of a rack may be less than the total 
> bandwidth of machines in the rack. The purpose of rack-aware replica 
> placement is to improve data reliability, availability, and network bandwidth 
> utilization. The basic idea is that each data node determines to which rack 
> it belongs at the startup time and notifies the name node of the rack id upon 
> registration. The name node maintains a rackid-to-datanode map and tries to 
> place replicas across racks.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to