Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.
The "HadoopIsNot" page has been changed by SteveLoughran. The comment on this change is: More details on what you need to know before you get started. http://wiki.apache.org/hadoop/HadoopIsNot?action=diff&rev1=4&rev2=5 -------------------------------------------------- == Hadoop clusters are not a place to learn Unix/Linux system administration == - You need to know your way round a Unix/Linux system. How to install it, what the various files in /etc/ are for, how to set up networking, what is a good hosts table, debug DNS problems, why to keep logs on a separate disk from the root disk, etc. If you cannot look after a single machine, you aren't going to be able to handle a cluster of 80 of them. That said, don't try maintaining those 80+ boxes using the same technique of hand-editing files lile [[/etc/hosts]], because it doesn't scale. + You need to know your way round a Unix/Linux system. How to install it, what the various files in /etc/ are for, how to set up networking, what is a good hosts table, how to debug DNS problems, why to keep logs on a separate disk from the root disk, etc. If you cannot look after a single machine, you aren't going to be able to handle a cluster of 80 of them. That said, don't try maintaining those 80+ boxes using the same technique of hand-editing files like [[/etc/hosts]], because it doesn't scale. + + Things you need to know + + * SSH, what it is, how to set up authorized_keys, how to use ssh and scp + * ifconfig, nslookup and other network config/diagnostics tools + * How your platform keeps itself up to date + * What the various log files your machine generates, and what they mean + * How to set up native filesystems and mount them + + This is important. If you don't know these, you are out of your depth and should not start installing Hadoop until you have the basics of a couple of linux systems up and running, letting you ssh in to each of them without entering a password, know each other's hostname and such like. The Hadoop installation documents all assume you can do these things, and aren't going to bother explaining about them. == Hadoop Filesystem is not a substitute for a High Availability SAN-hosted FS ==
