Thank you - that certainly is useful, and I would love to see more
information and discussion on that sort of thing.  However, I'm also looking
for some lower-level configuration, such as disk partitioning.

David

On Thu, Mar 5, 2009 at 11:36 AM, Sandy <[email protected]> wrote:

> Hi David,
>
> I don't know if you've seen this already, but this might be of some help:
> http://hadoop.apache.org/core/docs/r0.18.3/cluster_setup.html
>
> Near the bottom, there is a section called "Real-World Cluster
> Configurations" with some sample configuration parameters that were used to
> run a very large sort benchmark.
>
> All the best,
> -SM
>
> On Thu, Mar 5, 2009 at 10:20 AM, David Ritch <[email protected]>
> wrote:
>
> > Are there any published guidelines on system configuration for Hadoop?
> >
> > I've seen hardware suggestions, but I'm really interested in
> > recommendations
> > on disk layout and partitioning.  The defaults, as shipped and defined in
> > hadoop-default.xml, may be appropriate for testing, but are not really
> > appropriate for sustained use.  For example, data and metadata are both
> > stored in /tmp.  In typical use on a cluster with a couple hundred nodes,
> > the NameNode can generate 3-5GB of logs per day.  If you configure your
> > namenode host badly, it's easy to fill up the partition used by dfs for
> > metadata, and clobber your dfs filesystem.  I would think that
> thresholding
> > logs on WARN would be preferable to INFO.
> >
> > On a datanode, we would like to reserve as much space as we can for data,
> > but we know that map-reduce jobs need some local storage.  How do people
> > generally estimate the amount of space required for temporary storage?  I
> > would assume that it would be good to partition it from data storage, to
> > prevent running out of temp space on some nodes.  I would also think that
> > it
> > would be preferable for performance to have temp space on a different
> > spindle, so it and hdfs data can be accessed independently.
> >
> > I would be interested to know how other sites configure their systems,
> and
> > I
> > would love to see some guidelines for system configuration for Hadoop.
> >
> > Thank you!
> >
> > David
> >
>

Reply via email to