Thank you - that certainly is useful, and I would love to see more information and discussion on that sort of thing. However, I'm also looking for some lower-level configuration, such as disk partitioning.
David On Thu, Mar 5, 2009 at 11:36 AM, Sandy <[email protected]> wrote: > Hi David, > > I don't know if you've seen this already, but this might be of some help: > http://hadoop.apache.org/core/docs/r0.18.3/cluster_setup.html > > Near the bottom, there is a section called "Real-World Cluster > Configurations" with some sample configuration parameters that were used to > run a very large sort benchmark. > > All the best, > -SM > > On Thu, Mar 5, 2009 at 10:20 AM, David Ritch <[email protected]> > wrote: > > > Are there any published guidelines on system configuration for Hadoop? > > > > I've seen hardware suggestions, but I'm really interested in > > recommendations > > on disk layout and partitioning. The defaults, as shipped and defined in > > hadoop-default.xml, may be appropriate for testing, but are not really > > appropriate for sustained use. For example, data and metadata are both > > stored in /tmp. In typical use on a cluster with a couple hundred nodes, > > the NameNode can generate 3-5GB of logs per day. If you configure your > > namenode host badly, it's easy to fill up the partition used by dfs for > > metadata, and clobber your dfs filesystem. I would think that > thresholding > > logs on WARN would be preferable to INFO. > > > > On a datanode, we would like to reserve as much space as we can for data, > > but we know that map-reduce jobs need some local storage. How do people > > generally estimate the amount of space required for temporary storage? I > > would assume that it would be good to partition it from data storage, to > > prevent running out of temp space on some nodes. I would also think that > > it > > would be preferable for performance to have temp space on a different > > spindle, so it and hdfs data can be accessed independently. > > > > I would be interested to know how other sites configure their systems, > and > > I > > would love to see some guidelines for system configuration for Hadoop. > > > > Thank you! > > > > David > > >
