Hello xu, On Thu, May 5, 2011 at 12:24 PM, cheng xu <xcheng...@gmail.com> wrote: > Hi: > all! we know that the hdfs divide a large file into several blocks(with > each 64mb, 3 replications default). and once the metadata in the namenode > are modified, there goes a thread dataStreamer to transport the blocks to > the datanode. for each block, the client send the block to the 3 datanodes > with a pipeline.
The writing process originally only writes only a single copy. The replication is done by the NN later-on (as part of DN commands sent via heartbeats). > I just wondering how the cluster choose which datanodes to store the blocks. > what policy? > and as we know there may be plenty of blocks for a file. and what's the > sequences is for these blocks to be transported, cos from what I read from > the code, there is only one thread to do this from the client to the > datanodes. A suitable set of DNs for streaming every block is chosen using the client's node name (hostname) via a mapping maintained (Host2NodesMap), for the best locality. In case there are multiple matches (possibly multiple DNs per hostname), a random one is chosen out of them. This is repeated in a sequence for every block write request. In case none could be determined out of the client hostname, one is chosen using the ReplicationTargetChooser's algorithm (which would choose a 'random' node out for the first writer if its unable to determine a local node). -- Harsh J