Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change 
notification.

The "HadoopIsNot" page has been changed by SteveLoughran.
The comment on this change is: More details on what you need to know before you 
get started.
http://wiki.apache.org/hadoop/HadoopIsNot?action=diff&rev1=4&rev2=5

--------------------------------------------------

  
  == Hadoop clusters are not a place to learn Unix/Linux system administration 
==
  
- You need to know your way round a Unix/Linux system. How to install it, what 
the various files in /etc/ are for, how to set up networking, what is a good 
hosts table, debug DNS problems, why to keep logs on a separate disk from the 
root disk, etc. If you cannot look after a single machine, you aren't going to 
be able to handle a cluster of 80 of them. That said, don't try maintaining 
those 80+ boxes using the same technique of hand-editing files lile 
[[/etc/hosts]], because it doesn't scale.
+ You need to know your way round a Unix/Linux system. How to install it, what 
the various files in /etc/ are for, how to set up networking, what is a good 
hosts table, how to debug DNS problems, why to keep logs on a separate disk 
from the root disk, etc. If you cannot look after a single machine, you aren't 
going to be able to handle a cluster of 80 of them. That said, don't try 
maintaining those 80+ boxes using the same technique of hand-editing files like 
[[/etc/hosts]], because it doesn't scale.
+ 
+ Things you need to know
+ 
+  * SSH, what it is, how to set up authorized_keys, how to use ssh and scp
+  * ifconfig, nslookup and other network config/diagnostics tools
+  * How your platform keeps itself up to date
+  * What the various log files your machine generates, and what they mean
+  * How to set up native filesystems and mount them
+ 
+ This is important. If you don't know these, you are out of your depth and 
should not start installing Hadoop until you have the basics of a couple of 
linux systems up and running, letting you ssh in to each of them without 
entering a password, know each other's hostname and such like. The Hadoop 
installation documents all assume you can do these things, and aren't going to 
bother explaining about them.
  
  == Hadoop Filesystem is not a substitute for a High Availability SAN-hosted 
FS ==
  

Reply via email to