Hi David, I don't know if you've seen this already, but this might be of some help: http://hadoop.apache.org/core/docs/r0.18.3/cluster_setup.html
Near the bottom, there is a section called "Real-World Cluster Configurations" with some sample configuration parameters that were used to run a very large sort benchmark. All the best, -SM On Thu, Mar 5, 2009 at 10:20 AM, David Ritch <[email protected]> wrote: > Are there any published guidelines on system configuration for Hadoop? > > I've seen hardware suggestions, but I'm really interested in > recommendations > on disk layout and partitioning. The defaults, as shipped and defined in > hadoop-default.xml, may be appropriate for testing, but are not really > appropriate for sustained use. For example, data and metadata are both > stored in /tmp. In typical use on a cluster with a couple hundred nodes, > the NameNode can generate 3-5GB of logs per day. If you configure your > namenode host badly, it's easy to fill up the partition used by dfs for > metadata, and clobber your dfs filesystem. I would think that thresholding > logs on WARN would be preferable to INFO. > > On a datanode, we would like to reserve as much space as we can for data, > but we know that map-reduce jobs need some local storage. How do people > generally estimate the amount of space required for temporary storage? I > would assume that it would be good to partition it from data storage, to > prevent running out of temp space on some nodes. I would also think that > it > would be preferable for performance to have temp space on a different > spindle, so it and hdfs data can be accessed independently. > > I would be interested to know how other sites configure their systems, and > I > would love to see some guidelines for system configuration for Hadoop. > > Thank you! > > David >
