Hi David,

I don't know if you've seen this already, but this might be of some help:
http://hadoop.apache.org/core/docs/r0.18.3/cluster_setup.html

Near the bottom, there is a section called "Real-World Cluster
Configurations" with some sample configuration parameters that were used to
run a very large sort benchmark.

All the best,
-SM

On Thu, Mar 5, 2009 at 10:20 AM, David Ritch <[email protected]> wrote:

> Are there any published guidelines on system configuration for Hadoop?
>
> I've seen hardware suggestions, but I'm really interested in
> recommendations
> on disk layout and partitioning.  The defaults, as shipped and defined in
> hadoop-default.xml, may be appropriate for testing, but are not really
> appropriate for sustained use.  For example, data and metadata are both
> stored in /tmp.  In typical use on a cluster with a couple hundred nodes,
> the NameNode can generate 3-5GB of logs per day.  If you configure your
> namenode host badly, it's easy to fill up the partition used by dfs for
> metadata, and clobber your dfs filesystem.  I would think that thresholding
> logs on WARN would be preferable to INFO.
>
> On a datanode, we would like to reserve as much space as we can for data,
> but we know that map-reduce jobs need some local storage.  How do people
> generally estimate the amount of space required for temporary storage?  I
> would assume that it would be good to partition it from data storage, to
> prevent running out of temp space on some nodes.  I would also think that
> it
> would be preferable for performance to have temp space on a different
> spindle, so it and hdfs data can be accessed independently.
>
> I would be interested to know how other sites configure their systems, and
> I
> would love to see some guidelines for system configuration for Hadoop.
>
> Thank you!
>
> David
>

Reply via email to