Hi: all! we know that the hdfs divide a large file into several blocks(with each 64mb, 3 replications default). and once the metadata in the namenode are modified, there goes a thread dataStreamer to transport the blocks to the datanode. for each block, the client send the block to the 3 datanodes with a pipeline.
dfsClient.namenode.create(src, masked, dfsClient.clientName, new EnumSetWritable<CreateFlag>(flag), createParent, replication, blockSize); streamer = new DataStreamer(); streamer.start(); I just wondering how the cluster choose which datanodes to store the blocks. what policy? and as we know there may be plenty of blocks for a file. and what's the sequences is for these blocks to be transported, cos from what I read from the code, there is only one thread to do this from the client to the datanodes. any answer or url are appreciated.thanks best regards! xu