Hi everyone, I'm looking for some recommendations for how to get our Hadoop cluster to do faster I/O.
Currently, our lab cluster is 8 worker nodes and 1 master node (with NameNode and JobTracker). Each worker node has: - 48 GB RAM - 16 processors (Intel Xeon E5630 @ 2.53 GHz) - 1 Gb Ethernet connection Due to company policy, we have to keep the HDFS storage on a disk array. Our SAS connected array is capable of 6 Gb (768 MB) for each of the 8 hosts. So, theoretically, we should be able to get a max of 6 GB simultaneous reads across the 8 nodes if we benchmark it. Our disk array is presenting each of the 8 nodes with a 21 TB LUN. The LUN is RAID-5 across 12 disks on the array. That LUN is partitioned on the server into 6 different devices like this: >> df -h Filesystem Size Used Avail Use% Mounted on /dev/sdg1 3.5T 445G 2.9T 14% /data2/d1 /dev/sdg2 3.5T 439G 2.9T 14% /data2/d2 /dev/sdg3 3.5T 436G 2.9T 13% /data2/d3 /dev/sdg4 3.5T 435G 2.9T 13% /data2/d4 /dev/sdg5 3.5T 434G 2.9T 13% /data2/d5 /dev/sdg6 3.5T 431G 2.9T 13% /data2/d6 The file system type is ext3. So, when we run TestDFSIO, here are the results: *++ Write ++* >> hadoop jar /usr/lib/hadoop/hadoop-test-0.20.2-CDH3B4.jar TestDFSIO -write -nrFiles 80 -fileSize 10000 11/09/27 18:54:53 INFO fs.TestDFSIO: ----- TestDFSIO ----- : write 11/09/27 18:54:53 INFO fs.TestDFSIO: Date & time: Tue Sep 27 18:54:53 EDT 2011 11/09/27 18:54:53 INFO fs.TestDFSIO: Number of files: 80 11/09/27 18:54:53 INFO fs.TestDFSIO: Total MBytes processed: 800000 11/09/27 18:54:53 INFO fs.TestDFSIO: Throughput mb/sec: 8.2742240008678 11/09/27 18:54:53 INFO fs.TestDFSIO: Average IO rate mb/sec: 8.288116455078125 11/09/27 18:54:53 INFO fs.TestDFSIO: IO rate std deviation: 0.3435565217052116 11/09/27 18:54:53 INFO fs.TestDFSIO: Test exec time sec: 1427.856 So, throughput across all 8 nodes is 8.27 * 80 = 661 MB per second. *++ Read ++* >> hadoop jar /usr/lib/hadoop/hadoop-test-0.20.2-CDH3B4.jar TestDFSIO -read -nrFiles 80 -fileSize 10000 11/09/27 19:43:12 INFO fs.TestDFSIO: ----- TestDFSIO ----- : read 11/09/27 19:43:12 INFO fs.TestDFSIO: Date & time: Tue Sep 27 19:43:12 EDT 2011 11/09/27 19:43:12 INFO fs.TestDFSIO: Number of files: 80 11/09/27 19:43:12 INFO fs.TestDFSIO: Total MBytes processed: 800000 11/09/27 19:43:12 INFO fs.TestDFSIO: Throughput mb/sec: 5.854318503905489 11/09/27 19:43:12 INFO fs.TestDFSIO: Average IO rate mb/sec: 5.96372652053833 11/09/27 19:43:12 INFO fs.TestDFSIO: IO rate std deviation: 0.9885505979030621 11/09/27 19:43:12 INFO fs.TestDFSIO: Test exec time sec: 2055.465 So, throughput across all 8 nodes is 5.85 * 80 = 468 MB per second. *Question 1:* Why are the reads and writes so much slower than expected? Any suggestions about what can be changed? I understand that RAID-5 backed disks are an unorthodox configuration for HDFS, but has anybody successfully done this? If so, what kind of results did you see? Also, we detached the 8 nodes from the disk array and connected each of them to 6 local hard drives for testing (w/ ext4 file system). Then we ran the same read TestDFSIO and saw this: 11/09/26 20:24:09 INFO fs.TestDFSIO: ----- TestDFSIO ----- : read 11/09/26 20:24:09 INFO fs.TestDFSIO: Date & time: Mon Sep 26 20:24:09 EDT 2011 11/09/26 20:24:09 INFO fs.TestDFSIO: Number of files: 80 11/09/26 20:24:09 INFO fs.TestDFSIO: Total MBytes processed: 800000 11/09/26 20:24:09 INFO fs.TestDFSIO: Throughput mb/sec: 13.065623285187982 11/09/26 20:24:09 INFO fs.TestDFSIO: Average IO rate mb/sec: 15.160531997680664 11/09/26 20:24:09 INFO fs.TestDFSIO: IO rate std deviation: 8.000530562022949 11/09/26 20:24:09 INFO fs.TestDFSIO: Test exec time sec: 1123.447 So, with local disks, reads are about 1 GB per second across the 8 nodes. Much faster! With 6 local disks, writes performed the same though: 11/09/26 19:49:58 INFO fs.TestDFSIO: ----- TestDFSIO ----- : write 11/09/26 19:49:58 INFO fs.TestDFSIO: Date & time: Mon Sep 26 19:49:58 EDT 2011 11/09/26 19:49:58 INFO fs.TestDFSIO: Number of files: 80 11/09/26 19:49:58 INFO fs.TestDFSIO: Total MBytes processed: 800000 11/09/26 19:49:58 INFO fs.TestDFSIO: Throughput mb/sec: 8.573949802610528 11/09/26 19:49:58 INFO fs.TestDFSIO: Average IO rate mb/sec: 8.588902473449707 11/09/26 19:49:58 INFO fs.TestDFSIO: IO rate std deviation: 0.3639466752546032 11/09/26 19:49:58 INFO fs.TestDFSIO: Test exec time sec: 1383.734 Write throughput across the cluster was 685 MB per second. By the way, our HDFS file system is healthy: Status: HEALTHY Total size: 9018951544337 B Total dirs: 24230 Total files: 1032578 Total blocks (validated): 1139580 (avg. block size 7914276 B) Minimally replicated blocks: 1139580 (100.0 %) Over-replicated blocks: 1 (8.775163E-5 %) Under-replicated blocks: 16 (0.001404026 %) Mis-replicated blocks: 0 (0.0 %) Default replication factor: 2 Average block replication: 2.0122387 Corrupt blocks: 0 Missing replicas: 32 (0.0013954865 %) Number of data-nodes: 8 Number of racks: 1 FSCK ended at Tue Sep 27 18:57:23 EDT 2011 in 7453 milliseconds The filesystem under path '/' is HEALTHY - Sameer
