Author: ddas
Date: Tue Feb 19 21:17:48 2008
New Revision: 629361
URL: http://svn.apache.org/viewvc?rev=629361&view=rev
Log:
HADOOP-2730. HOD documentation update. Contributed by Vinod Kumar Vavilapalli.
Added:
hadoop/core/trunk/docs/hod_admin_guide.html
hadoop/core/trunk/docs/hod_admin_guide.pdf
hadoop/core/trunk/docs/hod_config_guide.html
hadoop/core/trunk/docs/hod_config_guide.pdf
hadoop/core/trunk/docs/hod_user_guide.html
hadoop/core/trunk/docs/hod_user_guide.pdf
hadoop/core/trunk/src/docs/src/documentation/content/xdocs/hod_admin_guide.xml
hadoop/core/trunk/src/docs/src/documentation/content/xdocs/hod_config_guide.xml
hadoop/core/trunk/src/docs/src/documentation/content/xdocs/hod_user_guide.xml
Modified:
hadoop/core/trunk/CHANGES.txt
hadoop/core/trunk/docs/hod.html
hadoop/core/trunk/docs/hod.pdf
hadoop/core/trunk/docs/linkmap.html
hadoop/core/trunk/src/docs/src/documentation/content/xdocs/hod.xml
hadoop/core/trunk/src/docs/src/documentation/content/xdocs/site.xml
Modified: hadoop/core/trunk/CHANGES.txt
URL:
http://svn.apache.org/viewvc/hadoop/core/trunk/CHANGES.txt?rev=629361&r1=629360&r2=629361&view=diff
==============================================================================
--- hadoop/core/trunk/CHANGES.txt (original)
+++ hadoop/core/trunk/CHANGES.txt Tue Feb 19 21:17:48 2008
@@ -62,6 +62,9 @@
HADOOP-2371. User guide for file permissions in HDFS.
(Robert Chansler via rangadi)
+
+ HADOOP-2730. HOD documentation update.
+ (Vinod Kumar Vavilapalli via ddas)
BUG FIXES
Modified: hadoop/core/trunk/docs/hod.html
URL:
http://svn.apache.org/viewvc/hadoop/core/trunk/docs/hod.html?rev=629361&r1=629360&r2=629361&view=diff
==============================================================================
--- hadoop/core/trunk/docs/hod.html (original)
+++ hadoop/core/trunk/docs/hod.html Tue Feb 19 21:17:48 2008
@@ -177,122 +177,7 @@
<a href="#Introduction"> Introduction </a>
</li>
<li>
-<a href="#Feature+List"> Feature List </a>
-<ul class="minitoc">
-<li>
-<a href="#Simplified+Interface+for+Provisioning+Hadoop+Clusters"> Simplified
Interface for Provisioning Hadoop Clusters </a>
-</li>
-<li>
-<a href="#Automatic+installation+of+Hadoop"> Automatic installation of Hadoop
</a>
-</li>
-<li>
-<a href="#Configuring+Hadoop"> Configuring Hadoop </a>
-</li>
-<li>
-<a href="#Auto-cleanup+of+Unused+Clusters"> Auto-cleanup of Unused Clusters
</a>
-</li>
-<li>
-<a href="#Log+Services"> Log Services </a>
-</li>
-</ul>
-</li>
-<li>
-<a href="#HOD+Components"> HOD Components </a>
-<ul class="minitoc">
-<li>
-<a href="#HOD+Client"> HOD Client </a>
-</li>
-<li>
-<a href="#RingMaster"> RingMaster </a>
-</li>
-<li>
-<a href="#HodRing"> HodRing </a>
-</li>
-<li>
-<a href="#Hodrc+%2F+HOD+configuration+file"> Hodrc / HOD configuration file
</a>
-</li>
-<li>
-<a href="#Submit+Nodes+and+Compute+Nodes"> Submit Nodes and Compute Nodes </a>
-</li>
-</ul>
-</li>
-<li>
-<a href="#Getting+Started+with+HOD"> Getting Started with HOD </a>
-<ul class="minitoc">
-<li>
-<a href="#Pre-Requisites"> Pre-Requisites </a>
-<ul class="minitoc">
-<li>
-<a href="#Hardware"> Hardware </a>
-</li>
-<li>
-<a href="#Software"> Software </a>
-</li>
-<li>
-<a href="#Resource+Manager+Configuration+Pre-requisites">Resource Manager
Configuration Pre-requisites</a>
-</li>
-</ul>
-</li>
-<li>
-<a href="#Setting+up+HOD">Setting up HOD</a>
-</li>
-</ul>
-</li>
-<li>
-<a href="#Running+HOD">Running HOD</a>
-<ul class="minitoc">
-<li>
-<a href="#Overview">Overview</a>
-<ul class="minitoc">
-<li>
-<a href="#Operation+allocate">Operation allocate</a>
-</li>
-<li>
-<a href="#Running+Hadoop+jobs+using+the+allocated+cluster">Running Hadoop jobs
using the allocated cluster</a>
-</li>
-<li>
-<a href="#Operation+deallocate">Operation deallocate</a>
-</li>
-</ul>
-</li>
-<li>
-<a href="#Command+Line+Options">Command Line Options</a>
-</li>
-</ul>
-</li>
-<li>
-<a href="#HOD+Configuration"> HOD Configuration </a>
-<ul class="minitoc">
-<li>
-<a href="#Introduction+to+HOD+Configuration"> Introduction to HOD
Configuration </a>
-</li>
-<li>
-<a href="#Categories+%2F+Sections+in+HOD+Configuration"> Categories / Sections
in HOD Configuration </a>
-</li>
-<li>
-<a href="#Important+and+Commonly+Used+Configuration+Options"> Important and
Commonly Used Configuration Options </a>
-<ul class="minitoc">
-<li>
-<a href="#Common+configuration+options"> Common configuration options </a>
-</li>
-<li>
-<a href="#hod+options"> hod options </a>
-</li>
-<li>
-<a href="#resource_manager+options"> resource_manager options </a>
-</li>
-<li>
-<a href="#ringmaster+options"> ringmaster options </a>
-</li>
-<li>
-<a href="#gridservice-hdfs+options"> gridservice-hdfs options </a>
-</li>
-<li>
-<a href="#gridservice-mapred+options"> gridservice-mapred options </a>
-</li>
-</ul>
-</li>
-</ul>
+<a href="#Documentation">Documentation</a>
</li>
</ul>
</div>
@@ -301,810 +186,26 @@
<h2 class="h3"> Introduction </h2>
<div class="section">
<p>
- The Hadoop On Demand (<acronym title="Hadoop On Demand">HOD</acronym>)
project is a system for provisioning and managing independent Hadoop MapReduce
instances on a shared cluster of nodes. HOD uses a resource manager for
allocation. At present it supports <a
href="http://www.clusterresources.com/pages/products/torque-resource-manager.php">Torque</a>
out of the box.
+Hadoop On Demand (HOD) is a system for provisioning virtual Hadoop clusters
over a large physical cluster. It uses the Torque resource manager to do node
allocation. On the allocated nodes, it can start Hadoop Map/Reduce and HDFS
daemons. It automatically generates the appropriate configuration files
(hadoop-site.xml) for the Hadoop daemons and client. HOD also has the
capability to distribute Hadoop to the nodes in the virtual cluster that it
allocates. In short, HOD makes it easy for administrators and users to quickly
setup and use Hadoop. It is also a very useful tool for Hadoop developers and
testers who need to share a physical cluster for testing their own Hadoop
versions.
</p>
</div>
-
-
-<a name="N1001F"></a><a name="Feature+List"></a>
-<h2 class="h3"> Feature List </h2>
-<div class="section">
-<a name="N10025"></a><a
name="Simplified+Interface+for+Provisioning+Hadoop+Clusters"></a>
-<h3 class="h4"> Simplified Interface for Provisioning Hadoop Clusters </h3>
-<p>
- By far, the biggest advantage of HOD is to quickly setup a Hadoop
cluster. The user interacts with the cluster through a simple command line
interface, the HOD client. HOD brings up a virtual MapReduce cluster with the
required number of nodes, which the user can use for running Hadoop jobs. When
done, HOD will automatically clean up the resources and make the nodes
available again.
- </p>
-<a name="N1002F"></a><a name="Automatic+installation+of+Hadoop"></a>
-<h3 class="h4"> Automatic installation of Hadoop </h3>
-<p>
- With HOD, Hadoop does not need to be even installed on the cluster.
The user can provide a Hadoop tarball that HOD will automatically distribute to
all the nodes in the cluster.
- </p>
-<a name="N10039"></a><a name="Configuring+Hadoop"></a>
-<h3 class="h4"> Configuring Hadoop </h3>
-<p>
- Dynamic parameters of Hadoop configuration, such as the NameNode and
JobTracker addresses and ports, and file system temporary directories are
generated and distributed by HOD automatically to all nodes in the cluster. In
addition, HOD allows the user to configure Hadoop parameters at both the server
(for e.g. JobTracker) and client (for e.g. JobClient) level, including 'final'
parameters, that were introduced with Hadoop 0.15.
- </p>
-<a name="N10043"></a><a name="Auto-cleanup+of+Unused+Clusters"></a>
-<h3 class="h4"> Auto-cleanup of Unused Clusters </h3>
-<p>
- HOD has an automatic timeout so that users cannot misuse resources
they aren't using. The timeout applies only when there is no MapReduce job
running.
- </p>
-<a name="N1004D"></a><a name="Log+Services"></a>
-<h3 class="h4"> Log Services </h3>
-<p>
- HOD can be used to collect all MapReduce logs to a central location
for archiving and inspection after the job is completed.
- </p>
-</div>
-
-
-<a name="N10058"></a><a name="HOD+Components"></a>
-<h2 class="h3"> HOD Components </h2>
-<div class="section">
-<p>
- This is a brief overview of the various components of HOD and how they
interact to provision Hadoop.
- </p>
-<a name="N10061"></a><a name="HOD+Client"></a>
-<h3 class="h4"> HOD Client </h3>
-<p>
- The HOD client is a Unix command that users use to allocate Hadoop
MapReduce clusters. The command provides other options to list allocated
clusters and deallocate them. The HOD client generates the
<em>hadoop-site.xml</em> in a user specified directory. The user can point to
this configuration file while running Map/Reduce jobs on the allocated cluster.
- </p>
-<p>
- The nodes from where the HOD Client is run are called <em>submit
nodes</em> because jobs are submitted to the resource manager system for
allocating and running clusters from these nodes.
- </p>
-<a name="N10074"></a><a name="RingMaster"></a>
-<h3 class="h4"> RingMaster </h3>
-<p>
- The RingMaster is a HOD process that is started on one node per every
allocated cluster. It is submitted as a 'job' to the resource manager by the
HOD client. It controls which Hadoop daemons start on which nodes. It provides
this information to other HOD processes, such as the HOD client, so users can
also determine this information. The RingMaster is responsible for hosting and
distributing the Hadoop tarball to all nodes in the cluster. It also
automatically cleans up unused clusters.
- </p>
-<p>
-
-</p>
-<a name="N10081"></a><a name="HodRing"></a>
-<h3 class="h4"> HodRing </h3>
-<p>
- The HodRing is a HOD process that runs on every allocated node in the
cluster. These processes are run by the RingMaster through the resource
manager, using a facility of parallel execution. The HodRings are responsible
for launching Hadoop commands on the nodes to bring up the Hadoop daemons. They
get the command to launch from the RingMaster.
- </p>
-<a name="N1008B"></a><a name="Hodrc+%2F+HOD+configuration+file"></a>
-<h3 class="h4"> Hodrc / HOD configuration file </h3>
-<p>
- An INI style configuration file where the users configure various
options for the HOD system, including install locations of different software,
resource manager parameters, log and temp file directories, parameters for
their MapReduce jobs, etc.
- </p>
-<a name="N10095"></a><a name="Submit+Nodes+and+Compute+Nodes"></a>
-<h3 class="h4"> Submit Nodes and Compute Nodes </h3>
-<p>
- The nodes from where the <em>HOD Client</em> is run are referred as
<em>submit nodes</em> because jobs are submitted to the resource manager system
for allocating and running clusters from these nodes.
- </p>
-<p>
- The nodes where the <em>Ringmaster</em> and <em>HodRings</em> run are
called the Compute nodes. These are the nodes that get allocated by a resource
manager, and on which the Hadoop daemons are provisioned and started.
- </p>
-</div>
-
-
-<a name="N100AF"></a><a name="Getting+Started+with+HOD"></a>
-<h2 class="h3"> Getting Started with HOD </h2>
+
+<a name="N10017"></a><a name="Documentation"></a>
+<h2 class="h3">Documentation</h2>
<div class="section">
-<a name="N100B5"></a><a name="Pre-Requisites"></a>
-<h3 class="h4"> Pre-Requisites </h3>
-<a name="N100BB"></a><a name="Hardware"></a>
-<h4> Hardware </h4>
-<p>
- HOD requires a minimum of 3 nodes configured through a resource
manager.
- </p>
-<a name="N100C5"></a><a name="Software"></a>
-<h4> Software </h4>
-<p>
- The following components are assumed to be installed before using
HOD:
- </p>
+<p>Please go through the following to know more about using HOD</p>
<ul>
-
-<li>
-
-<em>Torque:</em> Currently HOD supports Torque out of the box. We assume that
you are familiar with configuring Torque. You can get information about this
from <a
href="http://www.clusterresources.com/wiki/doku.php?id=torque:torque_wiki">here</a>.
- </li>
-
-<li>
-
-<em>Python:</em> We require version 2.5.1, which can be downloaded from <a
href="http://www.python.org/">here</a>.
- </li>
-
-</ul>
-<p>
- The following components can be optionally installed for getting
better functionality from HOD:
- </p>
-<ul>
-
-<li>
-
-<em>Twisted Python:</em> This can be used for improving the scalability of
HOD. Twisted Python is available <a
href="http://twistedmatrix.com/trac/">here</a>.
- </li>
-
-<li>
-
-<em>Hadoop:</em> HOD can automatically distribute Hadoop to all nodes in the
cluster. However, it can also use a pre-installed version of Hadoop, if it is
available on all nodes in the cluster. HOD currently supports Hadoop 0.15 and
above.
- </li>
-
-</ul>
-<p>
- HOD configuration requires the location of installs of these
components to be the same on all nodes in the cluster. It will also make the
configuration simpler to have the same location on the submit nodes.
- </p>
-<a name="N100FF"></a><a
name="Resource+Manager+Configuration+Pre-requisites"></a>
-<h4>Resource Manager Configuration Pre-requisites</h4>
-<p>
- For using HOD with Torque:
- </p>
-<ul>
-
-<li>
- Install Torque components: pbs_server on a head node, pbs_moms on
all compute nodes, and PBS client tools on all compute nodes and submit nodes.
- </li>
-
-<li>
- Create a queue for submitting jobs on the pbs_server.
- </li>
-
-<li>
- Specify a name for all nodes in the cluster, by setting a 'node
property' to all the nodes. This can be done by using the 'qmgr' command. For
example:
- <em>qmgr -c "set node node properties=cluster-name"</em>
-
-</li>
-
-<li>
- Ensure that jobs can be submitted to the nodes. This can be done
by using the 'qsub' command. For example:
- <em>echo "sleep 30" | qsub -l nodes=3</em>
-
-</li>
-
-</ul>
-<p>
- More information about setting up Torque can be found by referring
to the documentation <a
href="http://www.clusterresources.com/pages/products/torque-resource-manager.php">here.</a>
-
-</p>
-<a name="N10126"></a><a name="Setting+up+HOD"></a>
-<h3 class="h4">Setting up HOD</h3>
-<ul>
-
-<li>
- HOD is available in the 'contrib' section of Hadoop under the root
directory 'hod'. Distribute the files under this directory to all the nodes in
the cluster.
- </li>
-
-<li>
- On the node from where you want to run hod, edit the file hodrc
which can be found in the <em>install dir/conf</em> directory. This file
contains the minimal set of values required for running hod.
- </li>
-
+
<li>
- Specify values suitable to your environment for the following
variables defined in the configuration file. Note that some of these variables
are defined at more than one place in the file.
- </li>
-
-</ul>
-<table class="ForrestTable" cellspacing="1" cellpadding="4">
-
-<tr>
-
-<th colspan="1" rowspan="1"> Variable Name </th>
- <th colspan="1" rowspan="1"> Meaning </th>
-
-</tr>
-
-<tr>
-
-<td colspan="1" rowspan="1"> ${JAVA_HOME} </td>
- <td colspan="1" rowspan="1"> Location of Java for Hadoop. Hadoop
supports Sun JDK 1.5.x </td>
-
-</tr>
-
-<tr>
-
-<td colspan="1" rowspan="1"> ${CLUSTER_NAME} </td>
- <td colspan="1" rowspan="1"> Name of the cluster which is
specified in the 'node property' as mentioned in resource manager
configuration. </td>
-
-</tr>
-
-<tr>
-
-<td colspan="1" rowspan="1"> ${HADOOP_HOME} </td>
- <td colspan="1" rowspan="1"> Location of Hadoop installation on
the compute and submit nodes. </td>
-
-</tr>
-
-<tr>
-
-<td colspan="1" rowspan="1"> ${RM_QUEUE} </td>
- <td colspan="1" rowspan="1"> Queue configured for submiting jobs
in the resource manager configuration. </td>
-
-</tr>
-
-<tr>
-
-<td colspan="1" rowspan="1"> ${RM_HOME} </td>
- <td colspan="1" rowspan="1"> Location of the resource manager
installation on the compute and submit nodes. </td>
-
-</tr>
+<a href="hod_admin_guide.html">Hod Admin Guide</a> : This guide will walk you
through an overview of architecture of HOD, prerequisites, installing various
components and dependent software, and configuring HOD to get it up and
running.</li>
-</table>
-<ul>
-
<li>
- The following environment variables *may* need to be set depending
on your environment. These variables must be defined where you run the HOD
client, and also be specified in the HOD configuration file as the value of the
key resource_manager.env-vars. Multiple variables can be specified as a comma
separated list of key=value pairs.
- </li>
+<a href="hod_user_guide.html">Hod User Guide</a> : This guide will let you
know about how to get started on running hod, its various features, command
line options and help on troubleshooting in detail.</li>
+<li>
+<a href="hod_config_guide.html">Hod Configuration Guide</a> : This guide
discusses about onfiguring HOD, describing various configuration sections,
parameters and their purpose in detail.</li>
+
</ul>
-<table class="ForrestTable" cellspacing="1" cellpadding="4">
-
-<tr>
-
-<th colspan="1" rowspan="1"> Variable Name </th>
- <th colspan="1" rowspan="1"> Meaning </th>
-
-</tr>
-
-<tr>
-
-<td colspan="1" rowspan="1">HOD_PYTHON_HOME</td>
- <td colspan="1" rowspan="1">
- If you install python to a non-default location of the compute
nodes, or submit nodes, then, this variable must be defined to point to the
python executable in the non-standard location.
- </td>
-
-</tr>
-
-</table>
-<p>
- You can also review other configuration options in the file and modify
them to suit your needs. Refer to the the section on configuration below for
information about the HOD configuration.
- </p>
-</div>
-
-
-<a name="N101B4"></a><a name="Running+HOD"></a>
-<h2 class="h3">Running HOD</h2>
-<div class="section">
-<a name="N101BA"></a><a name="Overview"></a>
-<h3 class="h4">Overview</h3>
-<p>
- A typical session of HOD will involve atleast three steps: allocate,
run hadoop jobs, deallocate.
- </p>
-<a name="N101C3"></a><a name="Operation+allocate"></a>
-<h4>Operation allocate</h4>
-<p>
- The allocate operation is used to allocate a set of nodes and
install and provision Hadoop on them. It has the following syntax:
- </p>
-<table class="ForrestTable" cellspacing="1" cellpadding="4">
-
-<tr>
-
-<td colspan="1" rowspan="1">hod -c config_file -t hadoop_tarball_location -o
"allocate cluster_dir number_of_nodes"</td>
-
-</tr>
-
-</table>
-<p>
- The hadoop_tarball_location must be a location on a shared file
system accesible from all nodes in the cluster. Note, the cluster_dir must
exist before running the command. If the command completes successfully then
cluster_dir/hadoop-site.xml will be generated and will contain information
about the allocated cluster's JobTracker and NameNode.
- </p>
-<p>
- For example, the following command uses a hodrc file in
~/hod-config/hodrc and allocates Hadoop (provided by the tarball
~/share/hadoop.tar.gz) on 10 nodes, storing the generated Hadoop configuration
in a directory named <em>~/hadoop-cluster</em>:
- </p>
-<table class="ForrestTable" cellspacing="1" cellpadding="4">
-
-<tr>
-
-<td colspan="1" rowspan="1">$ hod -c ~/hod-config/hodrc -t
~/share/hadoop.tar.gz -o "allocate ~/hadoop-cluster 10"</td>
-
-</tr>
-
-</table>
-<p>
- HOD also supports an environment variable called
<em>HOD_CONF_DIR</em>. If this is defined, HOD will look for a default hodrc
file at $HOD_CONF_DIR/hodrc. Defining this allows the above command to also be
run as follows:
- </p>
-<table class="ForrestTable" cellspacing="1" cellpadding="4">
-
-<tr>
-
-<td colspan="1" rowspan="1">
-
-<p>$ export HOD_CONF_DIR=~/hod-config</p>
-
-<p>$ hod -t ~/share/hadoop.tar.gz -o "allocate ~/hadoop-cluster 10"</p>
-
-</td>
-
-</tr>
-
-</table>
-<a name="N10203"></a><a
name="Running+Hadoop+jobs+using+the+allocated+cluster"></a>
-<h4>Running Hadoop jobs using the allocated cluster</h4>
-<p>
- Now, one can run Hadoop jobs using the allocated cluster in the
usual manner:
- </p>
-<table class="ForrestTable" cellspacing="1" cellpadding="4">
-
-<tr>
-
-<td colspan="1" rowspan="1">hadoop --config cluster_dir hadoop_command
hadoop_command_args</td>
-
-</tr>
-
-</table>
-<p>
- Continuing our example, the following command will run a wordcount
example on the allocated cluster:
- </p>
-<table class="ForrestTable" cellspacing="1" cellpadding="4">
-
-<tr>
-
-<td colspan="1" rowspan="1">$ hadoop --config ~/hadoop-cluster jar
/path/to/hadoop/hadoop-examples.jar wordcount /path/to/input
/path/to/output</td>
-
-</tr>
-
-</table>
-<a name="N10226"></a><a name="Operation+deallocate"></a>
-<h4>Operation deallocate</h4>
-<p>
- The deallocate operation is used to release an allocated cluster.
When finished with a cluster, deallocate must be run so that the nodes become
free for others to use. The deallocate operation has the following syntax:
- </p>
-<table class="ForrestTable" cellspacing="1" cellpadding="4">
-
-<tr>
-
-<td colspan="1" rowspan="1">hod -o "deallocate cluster_dir"</td>
-
-</tr>
-
-</table>
-<p>
- Continuing our example, the following command will deallocate the
cluster:
- </p>
-<table class="ForrestTable" cellspacing="1" cellpadding="4">
-
-<tr>
-
-<td colspan="1" rowspan="1">$ hod -o "deallocate ~/hadoop-cluster"</td>
-
-</tr>
-
-</table>
-<a name="N1024A"></a><a name="Command+Line+Options"></a>
-<h3 class="h4">Command Line Options</h3>
-<p>
- This section covers the major command line options available via the
hod command:
- </p>
-<p>
-
-<em>--help</em>
-
-</p>
-<p>
- Prints out the help message to see the basic options.
- </p>
-<p>
-
-<em>--verbose-help</em>
-
-</p>
-<p>
- All configuration options provided in the hodrc file can be passed on
the command line, using the syntax --section_name.option_name[=value]. When
provided this way, the value provided on command line overrides the option
provided in hodrc. The verbose-help command lists all the available options in
the hodrc file. This is also a nice way to see the meaning of the configuration
options.
- </p>
-<p>
-
-<em>-c config_file</em>
-
-</p>
-<p>
- Provides the configuration file to use. Can be used with all other
options of HOD. Alternatively, the HOD_CONF_DIR environment variable can be
defined to specify a directory that contains a file named hodrc, alleviating
the need to specify the configuration file in each HOD command.
- </p>
-<p>
-
-<em>-b 1|2|3|4</em>
-
-</p>
-<p>
- Enables the given debug level. Can be used with all other options of
HOD. 4 is most verbose.
- </p>
-<p>
-
-<em>-o "help"</em>
-
-</p>
-<p>
- Lists the operations available in the operation mode.
- </p>
-<p>
-
-<em>-o "allocate cluster_dir number_of_nodes"</em>
-
-</p>
-<p>
- Allocates a cluster on the given number of cluster nodes, and store
the allocation information in cluster_dir for use with subsequent hadoop
commands. Note that the cluster_dir must exist before running the command.
- </p>
-<p>
-
-<em>-o "list"</em>
-
-</p>
-<p>
- Lists the clusters allocated by this user. Information provided
includes the Torque job id corresponding to the cluster, the cluster directory
where the allocation information is stored, and whether the Map/Reduce daemon
is still active or not.
- </p>
-<p>
-
-<em>-o "info cluster_dir"</em>
-
-</p>
-<p>
- Lists information about the cluster whose allocation information is
stored in the specified cluster directory.
- </p>
-<p>
-
-<em>-o "deallocate cluster_dir"</em>
-
-</p>
-<p>
- Deallocates the cluster whose allocation information is stored in the
specified cluster directory.
- </p>
-<p>
-
-<em>-t hadoop_tarball</em>
-
-</p>
-<p>
- Provisions Hadoop from the given tar.gz file. This option is only
applicable to the allocate operation. For better distribution performance it is
recommended that the Hadoop tarball contain only the libraries and binaries,
and not the source or documentation.
- </p>
-<p>
-
-<em>-Mkey1=value1 -Mkey2=value2</em>
-
-</p>
-<p>
- Provides configuration parameters for the provisioned Map/Reduce
daemons (JobTracker and TaskTrackers). A hadoop-site.xml is generated with
these values on the cluster nodes
- </p>
-<p>
-
-<em>-Hkey1=value1 -Hkey2=value2</em>
-
-</p>
-<p>
- Provides configuration parameters for the provisioned HDFS daemons
(NameNode and DataNodes). A hadoop-site.xml is generated with these values on
the cluster nodes
- </p>
-<p>
-
-<em>-Ckey1=value1 -Ckey2=value2</em>
-
-</p>
-<p>
- Provides configuration parameters for the client from where jobs can
be submitted. A hadoop-site.xml is generated with these values on the submit
node.
- </p>
-</div>
-
-<a name="N102CA"></a><a name="HOD+Configuration"></a>
-<h2 class="h3"> HOD Configuration </h2>
-<div class="section">
-<a name="N102D0"></a><a name="Introduction+to+HOD+Configuration"></a>
-<h3 class="h4"> Introduction to HOD Configuration </h3>
-<p>
- Configuration options for HOD are organized as sections and options
within them. They can be specified in two ways: a configuration file in the INI
format, and as command line options to the HOD shell, specified in the format
--section.option[=value]. If the same option is specified in both places, the
value specified on the command line overrides the value in the configuration
file.
- </p>
-<p>
- To get a simple description of all configuration options, you can type
<em>hod --verbose-help</em>
-
-</p>
-<p>
- This section explains some of the most important or commonly used
configuration options in some more detail.
- </p>
-<a name="N102E3"></a><a
name="Categories+%2F+Sections+in+HOD+Configuration"></a>
-<h3 class="h4"> Categories / Sections in HOD Configuration </h3>
-<p>
- The following are the various sections in the HOD configuration:
- </p>
-<table class="ForrestTable" cellspacing="1" cellpadding="4">
-
-<tr>
-
-<th colspan="1" rowspan="1"> Section Name </th>
- <th colspan="1" rowspan="1"> Description </th>
-
-</tr>
-
-<tr>
-
-<td colspan="1" rowspan="1">hod</td>
- <td colspan="1" rowspan="1">Options for the HOD client</td>
-
-</tr>
-
-<tr>
-
-<td colspan="1" rowspan="1">resource_manager</td>
- <td colspan="1" rowspan="1">Options for specifying which resource
manager to use, and other parameters for using that resource manager</td>
-
-</tr>
-
-<tr>
-
-<td colspan="1" rowspan="1">ringmaster</td>
- <td colspan="1" rowspan="1">Options for the RingMaster process</td>
-
-</tr>
-
-<tr>
-
-<td colspan="1" rowspan="1">hodring</td>
- <td colspan="1" rowspan="1">Options for the HodRing process</td>
-
-</tr>
-
-<tr>
-
-<td colspan="1" rowspan="1">gridservice-mapred</td>
- <td colspan="1" rowspan="1">Options for the MapReduce daemons</td>
-
-</tr>
-
-<tr>
-
-<td colspan="1" rowspan="1">gridservice-hdfs</td>
- <td colspan="1" rowspan="1">Options for the HDFS daemons</td>
-
-</tr>
-
-</table>
-<a name="N1034B"></a><a
name="Important+and+Commonly+Used+Configuration+Options"></a>
-<h3 class="h4"> Important and Commonly Used Configuration Options </h3>
-<a name="N10351"></a><a name="Common+configuration+options"></a>
-<h4> Common configuration options </h4>
-<p>
- Certain configuration options are defined in most of the sections of
the HOD configuration. Options defined in a section, are used by the process
for which that section applies. These options have the same meaning, but can
have different values in each section.
- </p>
-<table class="ForrestTable" cellspacing="1" cellpadding="4">
-
-<tr>
-
-<th colspan="1" rowspan="1"> Option Name </th>
- <th colspan="1" rowspan="1"> Description </th>
-
-</tr>
-
-<tr>
-
-<td colspan="1" rowspan="1">temp-dir</td>
- <td colspan="1" rowspan="1">Temporary directory for usage by the
HOD processes. Make sure that the users who will run hod have rights to create
directories under the directory specified here.</td>
-
-</tr>
-
-<tr>
-
-<td colspan="1" rowspan="1">debug</td>
- <td colspan="1" rowspan="1">A numeric value from 1-4. 4 produces
the most log information, and 1 the least.</td>
-
-</tr>
-
-<tr>
-
-<td colspan="1" rowspan="1">log-dir</td>
- <td colspan="1" rowspan="1">Directory where log files are
stored. By default, this is <em>install-location/logs/</em>. The restrictions
and notes for the temp-dir variable apply here too.</td>
-
-</tr>
-
-<tr>
-
-<td colspan="1" rowspan="1">xrs-port-range</td>
- <td colspan="1" rowspan="1">A range of ports, among which an
available port shall be picked for use to run any XML-RPC based server daemon
processes of HOD.</td>
-
-</tr>
-
-<tr>
-
-<td colspan="1" rowspan="1">http-port-range</td>
- <td colspan="1" rowspan="1">A range of ports, among which an
available port shall be picked for use to run any HTTP based server daemon
processes of HOD.</td>
-
-</tr>
-
-</table>
-<a name="N103AF"></a><a name="hod+options"></a>
-<h4> hod options </h4>
-<table class="ForrestTable" cellspacing="1" cellpadding="4">
-
-<tr>
-
-<th colspan="1" rowspan="1"> Option Name </th>
- <th colspan="1" rowspan="1"> Description </th>
-
-</tr>
-
-<tr>
-
-<td colspan="1" rowspan="1">cluster</td>
- <td colspan="1" rowspan="1">A descriptive name given to the
cluster. For Torque, this is specified as a 'Node property' for every node in
the cluster. HOD uses this value to compute the number of available nodes.</td>
-
-</tr>
-
-<tr>
-
-<td colspan="1" rowspan="1">client-params</td>
- <td colspan="1" rowspan="1">A comma-separated list of hadoop
config parameters specified as key-value pairs. These will be used to generate
a hadoop-site.xml on the submit node that should be used for running MapReduce
jobs.</td>
-
-</tr>
-
-</table>
-<a name="N103E0"></a><a name="resource_manager+options"></a>
-<h4> resource_manager options </h4>
-<table class="ForrestTable" cellspacing="1" cellpadding="4">
-
-<tr>
-
-<th colspan="1" rowspan="1"> Option Name </th>
- <th colspan="1" rowspan="1"> Description </th>
-
-</tr>
-
-<tr>
-
-<td colspan="1" rowspan="1">queue</td>
- <td colspan="1" rowspan="1">Name of the queue configured in the
resource manager to which jobs are to be submitted.</td>
-
-</tr>
-
-<tr>
-
-<td colspan="1" rowspan="1">batch-home</td>
- <td colspan="1" rowspan="1">Install directory to which 'bin' is
appended and under which the executables of the resource manager can be found.
</td>
-
-</tr>
-
-<tr>
-
-<td colspan="1" rowspan="1">env-vars</td>
- <td colspan="1" rowspan="1">This is a comma separated list of
key-value pairs, expressed as key=value, which would be passed to the jobs
launched on the compute nodes. For example, if the python installation is in a
non-standard location, one can set the environment variable 'HOD_PYTHON_HOME'
to the path to the python executable. The HOD processes launched on the compute
nodes can then use this variable.</td>
-
-</tr>
-
-</table>
-<a name="N1041E"></a><a name="ringmaster+options"></a>
-<h4> ringmaster options </h4>
-<table class="ForrestTable" cellspacing="1" cellpadding="4">
-
-<tr>
-
-<th colspan="1" rowspan="1"> Option Name </th>
- <th colspan="1" rowspan="1"> Description </th>
-
-</tr>
-
-<tr>
-
-<td colspan="1" rowspan="1">work-dirs</td>
- <td colspan="1" rowspan="1">These are a list of comma separated
paths that will serve as the root for directories that HOD generates and passes
to Hadoop for use to store DFS / MapReduce data. For e.g. this is where DFS
data blocks will be stored. Typically, as many paths are specified as there are
disks available to ensure all disks are being utilized. The restrictions and
notes for the temp-dir variable apply here too.</td>
-
-</tr>
-
-</table>
-<a name="N10442"></a><a name="gridservice-hdfs+options"></a>
-<h4> gridservice-hdfs options </h4>
-<table class="ForrestTable" cellspacing="1" cellpadding="4">
-
-<tr>
-
-<th colspan="1" rowspan="1"> Option Name </th>
- <th colspan="1" rowspan="1"> Description </th>
-
-</tr>
-
-<tr>
-
-<td colspan="1" rowspan="1">external</td>
- <td colspan="1" rowspan="1">
-
-<p> If false, this indicates that a HDFS cluster must be bought up by the HOD
system, on the nodes which it allocates via the allocate command. Note that in
that case, when the cluster is de-allocated, it will bring down the HDFS
cluster, and all the data will be lost. If true, it will try and connect to an
externally configured HDFS system. </p>
-
-<p>Typically, because input for jobs are placed into HDFS before jobs are run,
and also the output from jobs in HDFS is required to be persistent, an internal
HDFS cluster is of little value in a production system. However, it allows for
quick testing.</p>
-
-</td>
-
-</tr>
-
-<tr>
-
-<td colspan="1" rowspan="1">host</td>
- <td colspan="1" rowspan="1">Hostname of the externally
configured NameNode, if any.</td>
-
-</tr>
-
-<tr>
-
-<td colspan="1" rowspan="1">fs_port</td>
- <td colspan="1" rowspan="1">Port to which NameNode RPC server is
bound.</td>
-
-</tr>
-
-<tr>
-
-<td colspan="1" rowspan="1">info_port</td>
- <td colspan="1" rowspan="1">Port to which the NameNode web UI
server is bound.</td>
-
-</tr>
-
-<tr>
-
-<td colspan="1" rowspan="1">pkgs</td>
- <td colspan="1" rowspan="1">Installation directory, under which
bin/hadoop executable is located. This can be used to use a pre-installed
version of Hadoop on the cluster.</td>
-
-</tr>
-
-<tr>
-
-<td colspan="1" rowspan="1">server-params</td>
- <td colspan="1" rowspan="1">A comma-separated list of hadoop
config parameters specified key-value pairs. These will be used to generate a
hadoop-site.xml that will be used by the NameNode and DataNodes.</td>
-
-</tr>
-
-<tr>
-
-<td colspan="1" rowspan="1">final-server-params</td>
- <td colspan="1" rowspan="1">Same as above, except they will be
marked final.</td>
-
-</tr>
-
-</table>
-<a name="N104BA"></a><a name="gridservice-mapred+options"></a>
-<h4> gridservice-mapred options </h4>
-<table class="ForrestTable" cellspacing="1" cellpadding="4">
-
-<tr>
-
-<th colspan="1" rowspan="1"> Option Name </th>
- <th colspan="1" rowspan="1"> Description </th>
-
-</tr>
-
-<tr>
-
-<td colspan="1" rowspan="1">external</td>
- <td colspan="1" rowspan="1">
-
-<p> If false, this indicates that a MapReduce cluster must be bought up by the
HOD system on the nodes which it allocates via the allocate command. If true,
if will try and connect to an externally configured MapReduce system.</p>
-
-</td>
-
-</tr>
-
-<tr>
-
-<td colspan="1" rowspan="1">host</td>
- <td colspan="1" rowspan="1">Hostname of the externally
configured JobTracker, if any.</td>
-
-</tr>
-
-<tr>
-
-<td colspan="1" rowspan="1">tracker_port</td>
- <td colspan="1" rowspan="1">Port to which the JobTracker RPC
server is bound.</td>
-
-</tr>
-
-<tr>
-
-<td colspan="1" rowspan="1">info_port</td>
- <td colspan="1" rowspan="1">Port to which the JobTracker web UI
server is bound.</td>
-
-</tr>
-
-<tr>
-
-<td colspan="1" rowspan="1">pkgs</td>
- <td colspan="1" rowspan="1">Installation directory, under which
bin/hadoop executable is located. This can be used to use a pre-installed
version of Hadoop on the cluster.</td>
-
-</tr>
-
-<tr>
-
-<td colspan="1" rowspan="1">server-params</td>
- <td colspan="1" rowspan="1">A comma-separated list of hadoop
config parameters specified key-value pairs. These will be used to generate a
hadoop-site.xml that will be used by the JobTracker and TaskTrackers.</td>
-
-</tr>
-
-<tr>
-
-<td colspan="1" rowspan="1">final-server-params</td>
- <td colspan="1" rowspan="1">Same as above, except they will be
marked final.</td>
-
-</tr>
-
-</table>
</div>
</div>