Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change 
notification.

The following page has been changed by EdwardCapriolo:
http://wiki.apache.org/hadoop/DiskSetup

------------------------------------------------------------------------------
  
  == Hardware ==
  
- You don't need RAID disk controllers for Hadoop, as it copies data across 
multiple machines instead. This increase the likelihood that there is a free 
task slot near that data, and if the servers are on different PSUs and 
switches, eliminates some more points of failure in the datacenter.
+ You don't need RAID disk controllers for Hadoop Data Node, as it copies data 
across multiple machines instead. This increase the likelihood that there is a 
free task slot near that data, and if the servers are on different PSUs and 
switches, eliminates some more points of failure in the data center.
+ 
+ While the Hadoop Name Node and Secondary Name Node can write to a list of 
drive locations, they will stop functioning if it can not write to ALL the 
locations. In this case a mirrored RAID is a good idea for higher availability.
  
  Having lots of disks per server gives you more raw IO bandwidth than having 
one or two big disks. If you have enough that different tasks can be using 
different disks for input and output, disk seeking is minimized, which is one 
of the big disk performance killers. That said: more disks have a higher power 
budget; if you are power limited, you may want fewer but larger disks.
  
@@ -22, +24 @@

  
  If mount the disks as noatime, then the file access times aren't written 
back; this speeds up reads. There is also relatime, which stores some access 
time information, but is not as slow as the classic atime attribute. Remember 
that any access time information kept by Hadoop is independent of the atime 
attribute of individual blocks, so Hadoop does not care what your settings are 
here. If you are mounting disks purely for hadoop, use noatime.
  
+ Formatting and tuning options are important. Using tunefs to set the reserve 
to zero percent can save you over 25 GigaBytes on a 1 TeraByte disk. Also the 
underlying file system is going to have many large files, you can get more 
space by lowering the number of inodes at format time.
  === Ext3 ===
  
  Yahoo! has publicly stated they use ext3. Regardless of the merits of the 
filesystem, that means that HDFS-on-ext3 has been publicly tested at a bigger 
scale than any other underlying filesystem.

Reply via email to