[Lucene-hadoop Wiki] Update of "HowToConfigure" by OwenOMalley

Apache Wiki Fri, 30 Jun 2006 15:27:12 -0700

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for 
change notification.


The following page has been changed by OwenOMalley:
http://wiki.apache.org/lucene-hadoop/HowToConfigure

New page:
= How To Configure Hadoop =

Hadoop is configured with a set of files. The files are loaded in order with 
the lower files taking priority over the higher ones:

|| '''Filename''' || '''Description''' ||
|| hadoop-default.xml || Generic default values ||
|| mapred-default.xml || Site specific default values ||
|| job.xml || Configuration for a specific map/reduce job ||
|| hadoop-site.xml || Site specific value that can not be modified by the job ||

== Look up path ==

Configuration files are found via Java's Classpath. Only the first instance of 
each file is used. The $HADOOP_CONF_DIR is added by the bin/hadoop script to 
the front of the path. When installing Hadoop on a cluster, it is best to use a 
conf directory outside of the distribution. That allows you to easily update 
the release on the cluster without changing your configuration by mistake.

== Hadoop-default.xml ==

This file has the default values for many of the configuration variables that 
are used by Hadoop. This file should never be in $HADOOP_CONF_DIR so that the 
version in the hadoop-*.jar is used. (Otherwise, if a variable is added to this 
file in a new release, you won't have it defined.)

== mapred-default.xml ==

This file should contain the majority of your customization of hadoop. Useful 
variables are:

|| '''Name''' || '''Meaning''' ||
|| dfs.block.size || size in bytes of each data block in DFS ||
|| io.sort.factor || number of input files to each level in the merge sort ||
|| io.sort.mb || size of buffer to sort the reduce inputs in ||
|| io.file.buffer.size || number of bytes used for buffering io files ||
|| mapred.reduce.parallel.copies || number of threads fetching map outputs for 
each reduce ||
|| dfs.replication || number of replicas for each DFS block ||
|| mapred.child.java.opts || options passed to children task jvms ||
|| mapred.min.split.size || minimum number of bytes in a map input split ||
|| mapred.output.compress || Should the reduce outputs be compressed? ||



== job.xml ==

This file is never created explicitly by the user. The map/reduce application 
creates a JobConf, which is serialized when the job is submitted.

== hadoop-site.xml ==

This file overrides any settings in the job.xml and therefore should be very 
minimal. Usually it just contains, the addresses of the NameNode and 
JobTracker, the port and working directories for the various servers.

[Lucene-hadoop Wiki] Update of "HowToConfigure" by OwenOMalley

Reply via email to