Modified: hadoop/core/branches/branch-0.18/src/docs/src/documentation/content/xdocs/hod_admin_guide.xml URL: http://svn.apache.org/viewvc/hadoop/core/branches/branch-0.18/src/docs/src/documentation/content/xdocs/hod_admin_guide.xml?rev=673920&r1=673919&r2=673920&view=diff ============================================================================== --- hadoop/core/branches/branch-0.18/src/docs/src/documentation/content/xdocs/hod_admin_guide.xml (original) +++ hadoop/core/branches/branch-0.18/src/docs/src/documentation/content/xdocs/hod_admin_guide.xml Thu Jul 3 23:39:47 2008 @@ -17,7 +17,8 @@ <title>Overview</title> <p>The Hadoop On Demand (HOD) project is a system for provisioning and -managing independent Hadoop MapReduce and HDFS instances on a shared cluster +managing independent Hadoop Map/Reduce and Hadoop Distributed File System (HDFS) +instances on a shared cluster of nodes. HOD is a tool that makes it easy for administrators and users to quickly setup and use Hadoop. It is also a very useful tool for Hadoop developers and testers who need to share a physical cluster for testing their own Hadoop @@ -30,17 +31,17 @@ </p> <p> -The basic system architecture of HOD includes components from:</p> +The basic system architecture of HOD includes these components:</p> <ul> - <li>A Resource manager (possibly together with a scheduler),</li> - <li>HOD components, and </li> - <li>Hadoop Map/Reduce and HDFS daemons.</li> + <li>A Resource manager (possibly together with a scheduler)</li> + <li>Various HOD components</li> + <li>Hadoop Map/Reduce and HDFS daemons</li> </ul> <p> HOD provisions and maintains Hadoop Map/Reduce and, optionally, HDFS instances through interaction with the above components on a given cluster of nodes. A cluster of -nodes can be thought of as comprising of two sets of nodes:</p> +nodes can be thought of as comprising two sets of nodes:</p> <ul> <li>Submit nodes: Users use the HOD client on these nodes to allocate clusters, and then use the Hadoop client to submit Hadoop jobs. </li> @@ -54,18 +55,18 @@ </p> <ul> - <li>The user uses the HOD client on the Submit node to allocate a required number of -cluster nodes, and provision Hadoop on them.</li> - <li>The HOD client uses a Resource Manager interface, (qsub, in Torque), to submit a HOD -process, called the RingMaster, as a Resource Manager job, requesting the user desired number -of nodes. This job is submitted to the central server of the Resource Manager (pbs_server, in Torque).</li> - <li>On the compute nodes, the resource manager slave daemons, (pbs_moms in Torque), accept -and run jobs that they are given by the central server (pbs_server in Torque). The RingMaster + <li>The user uses the HOD client on the Submit node to allocate a desired number of +cluster nodes and to provision Hadoop on them.</li> + <li>The HOD client uses a resource manager interface (qsub, in Torque) to submit a HOD +process, called the RingMaster, as a Resource Manager job, to request the user's desired number +of nodes. This job is submitted to the central server of the resource manager (pbs_server, in Torque).</li> + <li>On the compute nodes, the resource manager slave daemons (pbs_moms in Torque) accept +and run jobs that they are assigned by the central server (pbs_server in Torque). The RingMaster process is started on one of the compute nodes (mother superior, in Torque).</li> - <li>The Ringmaster then uses another Resource Manager interface, (pbsdsh, in Torque), to run + <li>The RingMaster then uses another resource manager interface (pbsdsh, in Torque) to run the second HOD component, HodRing, as distributed tasks on each of the compute nodes allocated.</li> - <li>The Hodrings, after initializing, communicate with the Ringmaster to get Hadoop commands, + <li>The HodRings, after initializing, communicate with the RingMaster to get Hadoop commands, and run them accordingly. Once the Hadoop commands are started, they register with the RingMaster, giving information about the daemons.</li> <li>All the configuration files needed for Hadoop instances are generated by HOD itself, @@ -74,24 +75,25 @@ JobTracker and HDFS daemons.</li> </ul> -<p>The rest of the document deals with the steps needed to setup HOD on a physical cluster of nodes.</p> +<p>The rest of this document describes how to setup HOD on a physical cluster of nodes.</p> </section> <section> <title>Pre-requisites</title> - +<p>To use HOD, your system should include the following hardware and software +components.</p> <p>Operating System: HOD is currently tested on RHEL4.<br/> -Nodes : HOD requires a minimum of 3 nodes configured through a resource manager.<br/></p> +Nodes : HOD requires a minimum of three nodes configured through a resource manager.<br/></p> <p> Software </p> -<p>The following components are to be installed on *ALL* the nodes before using HOD:</p> +<p>The following components must be installed on ALL nodes before using HOD:</p> <ul> <li>Torque: Resource manager</li> <li><a href="ext:hod/python">Python</a> : HOD requires version 2.5.1 of Python.</li> </ul> -<p>The following components can be optionally installed for getting better +<p>The following components are optional and can be installed to obtain better functionality from HOD:</p> <ul> <li><a href="ext:hod/twisted-python">Twisted Python</a>: This can be @@ -129,27 +131,27 @@ href="ext:hod/torque-mailing-list">here</a>. </p> -<p>For using HOD with Torque:</p> +<p>To use HOD with Torque:</p> <ul> - <li>Install Torque components: pbs_server on one node(head node), pbs_mom on all + <li>Install Torque components: pbs_server on one node (head node), pbs_mom on all compute nodes, and PBS client tools on all compute nodes and submit - nodes. Perform atleast a basic configuration so that the Torque system is up and - running i.e pbs_server knows which machines to talk to. Look <a + nodes. Perform at least a basic configuration so that the Torque system is up and + running, that is, pbs_server knows which machines to talk to. Look <a href="ext:hod/torque-basic-config">here</a> for basic configuration. For advanced configuration, see <a href="ext:hod/torque-advanced-config">here</a></li> <li>Create a queue for submitting jobs on the pbs_server. The name of the queue is the - same as the HOD configuration parameter, resource-manager.queue. The Hod client uses this queue to - submit the Ringmaster process as a Torque job.</li> - <li>Specify a 'cluster name' as a 'property' for all nodes in the cluster. - This can be done by using the 'qmgr' command. For example: - qmgr -c "set node node properties=cluster-name". The name of the cluster is the same as + same as the HOD configuration parameter, resource-manager.queue. The HOD client uses this queue to + submit the RingMaster process as a Torque job.</li> + <li>Specify a cluster name as a property for all nodes in the cluster. + This can be done by using the qmgr command. For example: + <code>qmgr -c "set node node properties=cluster-name"</code>. The name of the cluster is the same as the HOD configuration parameter, hod.cluster. </li> - <li>Ensure that jobs can be submitted to the nodes. This can be done by - using the 'qsub' command. For example: - echo "sleep 30" | qsub -l nodes=3</li> + <li>Make sure that jobs can be submitted to the nodes. This can be done by + using the qsub command. For example: + <code>echo "sleep 30" | qsub -l nodes=3</code></li> </ul> </section> @@ -157,14 +159,14 @@ <section> <title>Installing HOD</title> -<p>Now that the resource manager set up is done, we proceed on to obtaining and -installing HOD.</p> +<p>Once the resource manager is set up, you can obtain and +install HOD.</p> <ul> - <li>If you are getting HOD from the Hadoop tarball,it is available under the + <li>If you are getting HOD from the Hadoop tarball, it is available under the 'contrib' section of Hadoop, under the root directory 'hod'.</li> <li>If you are building from source, you can run ant tar from the Hadoop root - directory, to generate the Hadoop tarball, and then pick HOD from there, - as described in the point above.</li> + directory to generate the Hadoop tarball, and then get HOD from there, + as described above.</li> <li>Distribute the files under this directory to all the nodes in the cluster. Note that the location where the files are copied should be the same on all the nodes.</li> @@ -176,14 +178,17 @@ <section> <title>Configuring HOD</title> -<p>After HOD installation is done, it has to be configured before we start using -it.</p> +<p>You can configure HOD once it is installed. The minimal configuration needed +to run HOD is described below. More advanced configuration options are discussed +in the HOD Configuration Guide.</p> <section> - <title>Minimal Configuration to get started</title> + <title>Minimal Configuration</title> + <p>To get started using HOD, the following minimal configuration is + required:</p> <ul> - <li>On the node from where you want to run hod, edit the file hodrc - which can be found in the <install dir>/conf directory. This file - contains the minimal set of values required for running hod.</li> + <li>On the node from where you want to run HOD, edit the file hodrc + located in the <install dir>/conf directory. This file + contains the minimal set of values required to run hod.</li> <li> <p>Specify values suitable to your environment for the following variables defined in the configuration file. Note that some of these @@ -196,7 +201,7 @@ 'node property' as mentioned in resource manager configuration.</li> <li>${HADOOP_HOME}: Location of Hadoop installation on the compute and submit nodes.</li> - <li>${RM_QUEUE}: Queue configured for submiting jobs in the resource + <li>${RM_QUEUE}: Queue configured for submitting jobs in the resource manager configuration.</li> <li>${RM_HOME}: Location of the resource manager installation on the compute and submit nodes.</li> @@ -204,15 +209,15 @@ </li> <li> -<p>The following environment variables *may* need to be set depending on +<p>The following environment variables may need to be set depending on your environment. These variables must be defined where you run the - HOD client, and also be specified in the HOD configuration file as the + HOD client and must also be specified in the HOD configuration file as the value of the key resource_manager.env-vars. Multiple variables can be specified as a comma separated list of key=value pairs.</p> <ul> <li>HOD_PYTHON_HOME: If you install python to a non-default location - of the compute nodes, or submit nodes, then, this variable must be + of the compute nodes, or submit nodes, then this variable must be defined to point to the python executable in the non-standard location.</li> </ul> @@ -222,38 +227,38 @@ <section> <title>Advanced Configuration</title> - <p> You can review other configuration options in the file and modify them to suit - your needs. Refer to the <a href="hod_config_guide.html">Configuration Guide</a> for information about the HOD - configuration. - </p> + <p> You can review and modify other configuration options to suit + your specific needs. Refer to the <a href="hod_config_guide.html">Configuration + Guide</a> for more information.</p> </section> </section> <section> <title>Running HOD</title> - <p>You can now proceed to <a href="hod_user_guide.html">HOD User Guide</a> for information about how to run HOD, - what are the various features, options and for help in trouble-shooting.</p> + <p>You can run HOD once it is configured. Refer to <a + href="hod_user_guide.html">the HOD User Guide</a> for more information.</p> </section> <section> <title>Supporting Tools and Utilities</title> - <p>This section describes certain supporting tools and utilities that can be used in managing HOD deployments.</p> + <p>This section describes supporting tools and utilities that can be used to + manage HOD deployments.</p> <section> - <title>logcondense.py - Tool for removing log files uploaded to DFS</title> - <p>As mentioned in - <a href="hod_user_guide.html#Collecting+and+Viewing+Hadoop+Logs">this section</a> of the - <a href="hod_user_guide.html">HOD User Guide</a>, HOD can be configured to upload + <title>logcondense.py - Manage Log Files</title> + <p>As mentioned in the + <a href="hod_user_guide.html#Collecting+and+Viewing+Hadoop+Logs">HOD User Guide</a>, + HOD can be configured to upload Hadoop logs to a statically configured HDFS. Over time, the number of logs uploaded - to DFS could increase. logcondense.py is a tool that helps administrators to clean-up - the log files older than a certain number of days. </p> + to HDFS could increase. logcondense.py is a tool that helps + administrators to remove log files uploaded to HDFS. </p> <section> <title>Running logcondense.py</title> <p>logcondense.py is available under hod_install_location/support folder. You can either - run it using python, for e.g. <em>python logcondense.py</em>, or give execute permissions + run it using python, for example, <em>python logcondense.py</em>, or give execute permissions to the file, and directly run it as <em>logcondense.py</em>. logcondense.py needs to be run by a user who has sufficient permissions to remove files from locations where log - files are uploaded in the DFS, if permissions are enabled. For e.g. as mentioned in the + files are uploaded in the HDFS, if permissions are enabled. For example as mentioned in the <a href="hod_config_guide.html#3.7+hodring+options">configuration guide</a>, the logs could be configured to come under the user's home directory in HDFS. In that case, the user running logcondense.py should have super user privileges to remove the files from under @@ -302,8 +307,9 @@ <td>--dynamicdfs</td> <td>If true, this will indicate that the logcondense.py script should delete HDFS logs in addition to Map/Reduce logs. Otherwise, it only deletes Map/Reduce logs, which is also the - default if this option is not specified. This option is useful if dynamic DFS installations - are being provisioned by HOD, and the static DFS installation is being used only to collect + default if this option is not specified. This option is useful if + dynamic HDFS installations + are being provisioned by HOD, and the static HDFS installation is being used only to collect logs - a scenario that may be common in test clusters.</td> <td>false</td> </tr> @@ -314,14 +320,15 @@ </section> </section> <section> - <title>checklimits.sh - Tool to update torque comment field reflecting resource limits</title> - <p>checklimits is a HOD tool specific to Torque/Maui environment + <title>checklimits.sh - Monitor Resource Limits</title> + <p>checklimits.sh is a HOD tool specific to the Torque/Maui environment (<a href="ext:hod/maui">Maui Cluster Scheduler</a> is an open source job scheduler for clusters and supercomputers, from clusterresources). The checklimits.sh script - updates torque comment field when newly submitted job(s) violate/cross + updates the torque comment field when newly submitted job(s) violate or + exceed over user limits set up in Maui scheduler. It uses qstat, does one pass - over torque job list to find out queued or unfinished jobs, runs Maui + over the torque job-list to determine queued or unfinished jobs, runs Maui tool checkjob on each job to see if user limits are violated and then runs torque's qalter utility to update job attribute 'comment'. Currently it updates the comment as <em>User-limits exceeded. Requested:([0-9]*) @@ -330,16 +337,16 @@ the type of violation.</p> <section> <title>Running checklimits.sh</title> - <p>checklimits.sh is available under hod_install_location/support - folder. This is a shell script and can be run directly as <em>sh + <p>checklimits.sh is available under the hod_install_location/support + folder. This shell script can be run directly as <em>sh checklimits.sh </em>or as <em>./checklimits.sh</em> after enabling execute permissions. Torque and Maui binaries should be available on the machine where the tool is run and should be in the path - of the shell script process. In order for this tool to be able to update - comment field of jobs from different users, it has to be run with - torque administrative privileges. This tool has to be run repeatedly + of the shell script process. To update the + comment field of jobs from different users, this tool must be run with + torque administrative privileges. This tool must be run repeatedly after specific intervals of time to frequently update jobs violating - constraints, for e.g. via cron. Please note that the resource manager + constraints, for example via cron. Please note that the resource manager and scheduler commands used in this script can be expensive and so it is better not to run this inside a tight loop without sleeping.</p> </section>
Modified: hadoop/core/branches/branch-0.18/src/docs/src/documentation/content/xdocs/hod_config_guide.xml URL: http://svn.apache.org/viewvc/hadoop/core/branches/branch-0.18/src/docs/src/documentation/content/xdocs/hod_config_guide.xml?rev=673920&r1=673919&r2=673920&view=diff ============================================================================== --- hadoop/core/branches/branch-0.18/src/docs/src/documentation/content/xdocs/hod_config_guide.xml (original) +++ hadoop/core/branches/branch-0.18/src/docs/src/documentation/content/xdocs/hod_config_guide.xml Thu Jul 3 23:39:47 2008 @@ -16,26 +16,26 @@ <section> <title>1. Introduction</title> - <p>Configuration options for HOD are organized as sections and options - within them. They can be specified in two ways: a configuration file + <p>This document explains some of the most important and commonly used + Hadoop On Demand (HOD) configuration options. Configuration options + can be specified in two ways: a configuration file in the INI format, and as command line options to the HOD shell, specified in the format --section.option[=value]. If the same option is specified in both places, the value specified on the command line overrides the value in the configuration file.</p> <p> - To get a simple description of all configuration options, you can type + To get a simple description of all configuration options, type: </p> <table><tr><td><code>$ hod --verbose-help</code></td></tr></table> - <p>This document explains some of the most important or commonly used - configuration options in some more detail.</p> + </section> <section> <title>2. Sections</title> - <p>The following are the various sections in the HOD configuration:</p> + <p>HOD organizes configuration options into these sections:</p> <ul> <li> hod: Options for the HOD client</li> @@ -43,19 +43,19 @@ to use, and other parameters for using that resource manager</li> <li> ringmaster: Options for the RingMaster process, </li> <li> hodring: Options for the HodRing processes</li> - <li> gridservice-mapred: Options for the MapReduce daemons</li> + <li> gridservice-mapred: Options for the Map/Reduce daemons</li> <li> gridservice-hdfs: Options for the HDFS daemons.</li> </ul> - - <p>The next section deals with some of the important options in the HOD - configuration.</p> </section> <section> - <title>3. Important / Commonly Used Configuration Options</title> - + <title>3. HOD Configuration Options</title> + <p>The following section describes configuration options common to most + HOD sections followed by sections that describe configuration options + specific to each HOD section.</p> + <section> <title>3.1 Common configuration options</title> @@ -70,7 +70,7 @@ sure that the users who will run hod have rights to create directories under the directory specified here.</li> - <li>debug: A numeric value from 1-4. 4 produces the most log information, + <li>debug: Numeric value from 1-4. 4 produces the most log information, and 1 the least.</li> <li>log-dir: Directory where log files are stored. By default, this is @@ -78,10 +78,10 @@ temp-dir variable apply here too. </li> - <li>xrs-port-range: A range of ports, among which an available port shall + <li>xrs-port-range: Range of ports, among which an available port shall be picked for use to run an XML-RPC server.</li> - <li>http-port-range: A range of ports, among which an available port shall + <li>http-port-range: Range of ports, among which an available port shall be picked for use to run an HTTP server.</li> <li>java-home: Location of Java to be used by Hadoop.</li> @@ -96,15 +96,15 @@ <title>3.2 hod options</title> <ul> - <li>cluster: A descriptive name given to the cluster. For Torque, this is + <li>cluster: Descriptive name given to the cluster. For Torque, this is specified as a 'Node property' for every node in the cluster. HOD uses this value to compute the number of available nodes.</li> - <li>client-params: A comma-separated list of hadoop config parameters + <li>client-params: Comma-separated list of hadoop config parameters specified as key-value pairs. These will be used to generate a hadoop-site.xml on the submit node that - should be used for running MapReduce jobs.</li> - <li>job-feasibility-attr: A regular expression string that specifies + should be used for running Map/Reduce jobs.</li> + <li>job-feasibility-attr: Regular expression string that specifies whether and how to check job feasibility - resource manager or scheduler limits. The current implementation corresponds to the torque job @@ -113,16 +113,16 @@ of limit violation is triggered and either deallocates the cluster or stays in queued state according as the request is beyond maximum limits or - the cumulative usage has crossed maxumum limits. + the cumulative usage has crossed maximum limits. The torque comment attribute may be updated - periodically by an external mechanism. For e.g., + periodically by an external mechanism. For example, comment attribute can be updated by running <a href= "hod_admin_guide.html#checklimits.sh+-+Tool+to+update+torque+comment+field+reflecting+resource+limits"> checklimits.sh</a> script in hod/support directory, and then setting job-feasibility-attr equal to the - value TORQUE_USER_LIMITS_COMMENT_FIELD i.e + value TORQUE_USER_LIMITS_COMMENT_FIELD, "User-limits exceeded. Requested:([0-9]*) - Used:([0-9]*) MaxLimit:([0-9]*)" will make HOD + Used:([0-9]*) MaxLimit:([0-9]*)", will make HOD behave accordingly. </li> </ul> @@ -139,7 +139,7 @@ which the executables of the resource manager can be found.</li> - <li>env-vars: This is a comma separated list of key-value pairs, + <li>env-vars: Comma-separated list of key-value pairs, expressed as key=value, which would be passed to the jobs launched on the compute nodes. For example, if the python installation is @@ -154,18 +154,18 @@ <title>3.4 ringmaster options</title> <ul> - <li>work-dirs: These are a list of comma separated paths that will serve + <li>work-dirs: Comma-separated list of paths that will serve as the root for directories that HOD generates and passes - to Hadoop for use to store DFS / MapReduce data. For e.g. + to Hadoop for use to store DFS and Map/Reduce data. For e.g. this is where DFS data blocks will be stored. Typically, as many paths are specified as there are disks available to ensure all disks are being utilized. The restrictions and notes for the temp-dir variable apply here too.</li> - <li>max-master-failures: It defines how many times a hadoop master + <li>max-master-failures: Number of times a hadoop master daemon can fail to launch, beyond which HOD will fail the cluster allocation altogether. In HOD clusters, sometimes there might be a single or few "bad" nodes due - to issues like missing java, missing/incorrect version + to issues like missing java, missing or incorrect version of Hadoop etc. When this configuration variable is set to a positive integer, the RingMaster returns an error to the client only when the number of times a hadoop @@ -184,7 +184,7 @@ <title>3.5 gridservice-hdfs options</title> <ul> - <li>external: If false, this indicates that a HDFS cluster must be + <li>external: If false, indicates that a HDFS cluster must be bought up by the HOD system, on the nodes which it allocates via the allocate command. Note that in that case, when the cluster is de-allocated, it will bring down the @@ -207,7 +207,7 @@ located. This can be used to use a pre-installed version of Hadoop on the cluster.</li> - <li>server-params: A comma-separated list of hadoop config parameters + <li>server-params: Comma-separated list of hadoop config parameters specified key-value pairs. These will be used to generate a hadoop-site.xml that will be used by the NameNode and DataNodes.</li> @@ -220,11 +220,11 @@ <title>3.6 gridservice-mapred options</title> <ul> - <li>external: If false, this indicates that a MapReduce cluster must be + <li>external: If false, indicates that a Map/Reduce cluster must be bought up by the HOD system on the nodes which it allocates via the allocate command. If true, if will try and connect to an externally - configured MapReduce system.</li> + configured Map/Reduce system.</li> <li>host: Hostname of the externally configured JobTracker, if any</li> @@ -235,7 +235,7 @@ <li>pkgs: Installation directory, under which bin/hadoop executable is located</li> - <li>server-params: A comma-separated list of hadoop config parameters + <li>server-params: Comma-separated list of hadoop config parameters specified key-value pairs. These will be used to generate a hadoop-site.xml that will be used by the JobTracker and TaskTrackers</li> @@ -266,8 +266,8 @@ cluster node's local file path, use the format 'file://path'. When clusters are deallocated by HOD, the hadoop logs will - be deleted as part of HOD's cleanup process. In order to - persist these logs, you can use this configuration option. + be deleted as part of HOD's cleanup process. To ensure these + logs persist, you can use this configuration option. The format of the path is value-of-this-option/userid/hod-logs/cluster-id Modified: hadoop/core/branches/branch-0.18/src/docs/src/documentation/content/xdocs/hod_user_guide.xml URL: http://svn.apache.org/viewvc/hadoop/core/branches/branch-0.18/src/docs/src/documentation/content/xdocs/hod_user_guide.xml?rev=673920&r1=673919&r2=673920&view=diff ============================================================================== --- hadoop/core/branches/branch-0.18/src/docs/src/documentation/content/xdocs/hod_user_guide.xml (original) +++ hadoop/core/branches/branch-0.18/src/docs/src/documentation/content/xdocs/hod_user_guide.xml Thu Jul 3 23:39:47 2008 @@ -14,7 +14,7 @@ <title> Introduction </title><anchor id="Introduction"></anchor> <p>Hadoop On Demand (HOD) is a system for provisioning virtual Hadoop clusters over a large physical cluster. It uses the Torque resource manager to do node allocation. On the allocated nodes, it can start Hadoop Map/Reduce and HDFS daemons. It automatically generates the appropriate configuration files (hadoop-site.xml) for the Hadoop daemons and client. HOD also has the capability to distribute Hadoop to the nodes in the virtual cluster that it allocates. In short, HOD makes it easy for administrators and users to quickly setup and use Hadoop. It is also a very useful tool for Hadoop developers and testers who need to share a physical cluster for testing their own Hadoop versions.</p> <p>HOD supports Hadoop from version 0.15 onwards.</p> - <p>The rest of the documentation comprises of a quick-start guide that helps you get quickly started with using HOD, a more detailed guide of all HOD features, command line options, known issues and trouble-shooting information.</p> + <p>The rest of this document comprises of a quick-start guide that helps you get quickly started with using HOD, a more detailed guide of all HOD features, and a trouble-shooting section.</p> </section> <section> <title> Getting Started Using HOD </title><anchor id="Getting_Started_Using_HOD_0_4"></anchor> @@ -110,7 +110,7 @@ <section><title> Provisioning and Managing Hadoop Clusters </title><anchor id="Provisioning_and_Managing_Hadoop"></anchor> <p>The primary feature of HOD is to provision Hadoop Map/Reduce and HDFS clusters. This is described above in the Getting Started section. Also, as long as nodes are available, and organizational policies allow, a user can use HOD to allocate multiple Map/Reduce clusters simultaneously. The user would need to specify different paths for the <code>cluster_dir</code> parameter mentioned above for each cluster he/she allocates. HOD provides the <em>list</em> and the <em>info</em> operations to enable managing multiple clusters.</p> <p><strong> Operation <em>list</em></strong></p><anchor id="Operation_list"></anchor> - <p>The list operation lists all the clusters allocated so far by a user. The cluster directory where the hadoop-site.xml is stored for the cluster, and it's status vis-a-vis connectivity with the JobTracker and/or HDFS is shown. The list operation has the following syntax:</p> + <p>The list operation lists all the clusters allocated so far by a user. The cluster directory where the hadoop-site.xml is stored for the cluster, and its status vis-a-vis connectivity with the JobTracker and/or HDFS is shown. The list operation has the following syntax:</p> <table> <tr> @@ -219,7 +219,7 @@ <table><tr><td><code>log-destination-uri = hdfs://host123:45678/user/hod/logs</code> or</td></tr> <tr><td><code>log-destination-uri = file://path/to/store/log/files</code></td></tr> </table> - <p>Under the root directory specified above in the path, HOD will create a create a path user_name/torque_jobid and store gzipped log files for each node that was part of the job.</p> + <p>Under the root directory specified above in the path, HOD will create a path user_name/torque_jobid and store gzipped log files for each node that was part of the job.</p> <p>Note that to store the files to HDFS, you may need to configure the <code>hodring.pkgs</code> option with the Hadoop version that matches the HDFS mentioned. If not, HOD will try to use the Hadoop version that it is using to provision the Hadoop cluster itself.</p> </section> <section><title> Auto-deallocation of Idle Clusters </title><anchor id="Auto_deallocation_of_Idle_Cluste"></anchor> @@ -242,7 +242,7 @@ <td><code>$ hod allocate -d cluster_dir -n number_of_nodes -N name_of_job</code></td> </tr> </table> - <p><em>Note:</em> Due to restriction in the underlying Torque resource manager, names which do not start with a alphabet or contain a 'space' will cause the job to fail. The failure message points to the problem being in the specified job name.</p> + <p><em>Note:</em> Due to restriction in the underlying Torque resource manager, names which do not start with an alphabet character or contain a 'space' will cause the job to fail. The failure message points to the problem being in the specified job name.</p> </section> <section><title> Capturing HOD exit codes in Torque </title><anchor id="Capturing_HOD_exit_codes_in_Torq"></anchor> <p>HOD exit codes are captured in the Torque exit_status field. This will help users and system administrators to distinguish successful runs from unsuccessful runs of HOD. The exit codes are 0 if allocation succeeded and all hadoop jobs ran on the allocated cluster correctly. They are non-zero if allocation failed or some of the hadoop jobs failed on the allocated cluster. The exit codes that are possible are mentioned in the table below. <em>Note: Hadoop job status is captured only if the version of Hadoop used is 16 or above.</em></p>
