I need to indentify the bottleneck of my current cluster when running io-bound benchmarks. I run a test with 4 nodes, 1 node as job tracker and namenode, 3 nodes as task tracker and data nodes. I run RandomWriter to generate 30G data with 15 mappers, and then run Sort on the generated data with 15 reducers. Replication is 3. I add log code into HDFS code and then analyze the generated log for randomwriter period and sort period. The result is as follows. I measured the following values: 1) average block preparation time: the time DFSClient spent to generate all packets for a block. 2) average block writing time: from the time DFSClient gets an allocated block from namenode in nextBlockOutputStream() to all Acks are received. 3) average network receiving time and average disk writing time for a block for the first, second, third datanode in the pipeline.
RandomWriter Total 528 blocks, total size is 34063336366, full blocks(64M): 506 Average block preparation time by client is: 11456.21 Average writing time for one block(64M): 11931.49 Average time on No.0 target datanode: average network receiving time :112.44 average disk writing time :3035.04 Average time on No.1 target datanode: average network receiving time :3337.68 average disk writing time :2950.74 Average time on No.2 target datanode: average network receiving time :3171.18 average disk writing time :2646.38 sort Total 494 blocks, total size is 32318504139, full blocks(64M): 479 Average block preparation time by client is: 16237.59 Average writing time for one block(64M): 16642.67 Average time on No.0 target datanode: average network receiving time :164.28 average disk writing time :3331.50 Average time on No.1 target datanode: average network receiving time :2125.62 average disk writing time :3436.32 Average time on No.2 target datanode: average network receiving time :2856.56 average disk writing time :3426.04 And my question is why the network receiving time on the third node is larger than the other two nodes. Another question, how to identify the bottlenecks? Or You can tell me what other kinds of values should be collected. Thanks in advance. -- View this message in context: http://www.nabble.com/Question-on-HDFS-write-performance-tp23814528p23814528.html Sent from the Hadoop core-user mailing list archive at Nabble.com.