[08/11] hadoop git commit: HDFS-7668. Backport "Convert site documentation from apt to markdown" to branch-2 (Masatake Iwasaki via Colin P. McCabe)

cmccabe Tue, 24 Feb 2015 16:35:04 -0800

http://git-wip-us.apache.org/repos/asf/hadoop/blob/5c0073e7/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/HdfsNfsGateway.apt.vm
----------------------------------------------------------------------
diff --git a/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/HdfsNfsGateway.apt.vm 
b/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/HdfsNfsGateway.apt.vm
deleted file mode 100644
index 152a985..0000000
--- a/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/HdfsNfsGateway.apt.vm
+++ /dev/null
@@ -1,378 +0,0 @@
-
-~~ Licensed under the Apache License, Version 2.0 (the "License");
-~~ you may not use this file except in compliance with the License.
-~~ You may obtain a copy of the License at
-~~
-~~   http://www.apache.org/licenses/LICENSE-2.0
-~~
-~~ Unless required by applicable law or agreed to in writing, software
-~~ distributed under the License is distributed on an "AS IS" BASIS,
-~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-~~ See the License for the specific language governing permissions and
-~~ limitations under the License. See accompanying LICENSE file.
-
-  ---
-  Hadoop Distributed File System-${project.version} - HDFS NFS Gateway
-  ---
-  ---
-  ${maven.build.timestamp}
-
-HDFS NFS Gateway
-
-%{toc|section=1|fromDepth=0}
-
-* {Overview}
-
-  The NFS Gateway supports NFSv3 and allows HDFS to be mounted as part of the 
client's local file system.
-  Currently NFS Gateway supports and enables the following usage patterns:
-
-   * Users can browse the HDFS file system through their local file system
-     on NFSv3 client compatible operating systems.
-
-   * Users can download files from the the HDFS file system on to their
-     local file system.
-
-   * Users can upload files from their local file system directly to the
-     HDFS file system.
-
-   * Users can stream data directly to HDFS through the mount point. File
-     append is supported but random write is not supported. 
-
-  The NFS gateway machine needs the same thing to run an HDFS client like 
Hadoop JAR files, HADOOP_CONF directory.
-  The NFS gateway can be on the same host as DataNode, NameNode, or any HDFS 
client. 
-
-
-* {Configuration}
-
-   The NFS-gateway uses proxy user to proxy all the users accessing the NFS 
mounts. 
-   In non-secure mode, the user running the gateway is the proxy user, while 
in secure mode the
-   user in Kerberos keytab is the proxy user. Suppose the proxy user is 
'nfsserver'
-   and users belonging to the groups 'users-group1'
-   and 'users-group2' use the NFS mounts, then in core-site.xml of the 
NameNode, the following
-   two properities must be set and only NameNode needs restart after the 
configuration change
-   (NOTE: replace the string 'nfsserver' with the proxy user name in your 
cluster):
-
-----
-<property>
-  <name>hadoop.proxyuser.nfsserver.groups</name>
-  <value>root,users-group1,users-group2</value>
-  <description>
-         The 'nfsserver' user is allowed to proxy all members of the 
'users-group1' and 
-         'users-group2' groups. Note that in most cases you will need to 
include the
-         group "root" because the user "root" (which usually belonges to 
"root" group) will
-         generally be the user that initially executes the mount on the NFS 
client system. 
-         Set this to '*' to allow nfsserver user to proxy any group.
-  </description>
-</property>
-----
-
-----
-<property>
-  <name>hadoop.proxyuser.nfsserver.hosts</name>
-  <value>nfs-client-host1.com</value>
-  <description>
-         This is the host where the nfs gateway is running. Set this to '*' to 
allow
-         requests from any hosts to be proxied.
-  </description>
-</property>
-----
-
-   The above are the only required configuration for the NFS gateway in 
non-secure mode. For Kerberized
-   hadoop clusters, the following configurations need to be added to 
hdfs-site.xml for the gateway (NOTE: replace 
-   string "nfsserver" with the proxy user name and ensure the user contained 
in the keytab is
-   also the same proxy user):
-
-----
-  <property>
-    <name>nfs.keytab.file</name>
-    <value>/etc/hadoop/conf/nfsserver.keytab</value> <!-- path to the nfs 
gateway keytab -->
-  </property>
-----
-
-----
-  <property>
-    <name>nfs.kerberos.principal</name>
-    <value>nfsserver/[email protected]</value>
-  </property>
-----
-  
-   The rest of the NFS gateway configurations are optional for both secure and 
non-secure mode.
-
-   The AIX NFS client has a 
{{{https://issues.apache.org/jira/browse/HDFS-6549}few known issues}}
-   that prevent it from working correctly by default with the HDFS NFS
-   Gateway. If you want to be able to access the HDFS NFS Gateway from AIX, you
-   should set the following configuration setting to enable work-arounds for 
these
-   issues:
-
-----
-<property>
-  <name>nfs.aix.compatibility.mode.enabled</name>
-  <value>true</value>
-</property>
-----
-
-   Note that regular, non-AIX clients should NOT enable AIX compatibility mode.
-   The work-arounds implemented by AIX compatibility mode effectively disable
-   safeguards to ensure that listing of directory contents via NFS returns
-   consistent results, and that all data sent to the NFS server can be assured 
to
-   have been committed.
-
-   It's strongly recommended for the users to update a few configuration 
properties based on their use
-   cases. All the following configuration properties can be added or updated 
in hdfs-site.xml.
-  
-   * If the client mounts the export with access time update allowed, make 
sure the following 
-    property is not disabled in the configuration file. Only NameNode needs to 
restart after 
-    this property is changed. On some Unix systems, the user can disable 
access time update
-    by mounting the export with "noatime". If the export is mounted with 
"noatime", the user 
-    doesn't need to change the following property and thus no need to restart 
namenode.
-
-----
-<property>
-  <name>dfs.namenode.accesstime.precision</name>
-  <value>3600000</value>
-  <description>The access time for HDFS file is precise upto this value.
-    The default value is 1 hour. Setting a value of 0 disables
-    access times for HDFS.
-  </description>
-</property>
-----
-
-   * Users are expected to update the file dump directory. NFS client often
-      reorders writes. Sequential writes can arrive at the NFS gateway at 
random
-      order. This directory is used to temporarily save out-of-order writes
-      before writing to HDFS. For each file, the out-of-order writes are 
dumped after
-      they are accumulated to exceed certain threshold (e.g., 1MB) in memory.
-      One needs to make sure the directory has enough
-      space. For example, if the application uploads 10 files with each having
-      100MB, it is recommended for this directory to have roughly 1GB space in 
case if a
-      worst-case write reorder happens to every file. Only NFS gateway needs 
to restart after
-      this property is updated.
-
-----
-  <property>    
-    <name>nfs.dump.dir</name>
-    <value>/tmp/.hdfs-nfs</value>
-  </property>
----- 
-
-  * By default, the export can be mounted by any client. To better control the 
access,
-    users can update the following property. The value string contains machine 
name and
-    access privilege, separated by whitespace
-    characters. The machine name format can be a single host, a Java regular 
expression, or an IPv4 address. The
-    access privilege uses rw or ro to specify read/write or read-only access 
of the machines to exports. If the access
-    privilege is not provided, the default is read-only. Entries are separated 
by ";".
-    For example: "192.168.0.0/22 rw ; host.*\.example\.com ; host1.test.org 
ro;". Only the NFS gateway needs to restart after 
-    this property is updated.
-
-----
-<property>
-  <name>nfs.exports.allowed.hosts</name>
-  <value>* rw</value>
-</property>
-----
-
-  * JVM and log settings. You can export JVM settings (e.g., heap size and GC 
log) in 
-   HADOOP_NFS3_OPTS. More NFS related settings can be found in hadoop-env.sh. 
-   To get NFS debug trace, you can edit the log4j.property file 
-   to add the following. Note, debug trace, especially for ONCRPC, can be very 
verbose.
-
-    To change logging level:
-
------------------------------------------------ 
-    log4j.logger.org.apache.hadoop.hdfs.nfs=DEBUG
------------------------------------------------ 
-
-    To get more details of ONCRPC requests:
-
------------------------------------------------ 
-    log4j.logger.org.apache.hadoop.oncrpc=DEBUG
------------------------------------------------ 
-
-
-* {Start and stop NFS gateway service}
-
-  Three daemons are required to provide NFS service: rpcbind (or portmap), 
mountd and nfsd.
-  The NFS gateway process has both nfsd and mountd. It shares the HDFS root 
"/" as the
-  only export. It is recommended to use the portmap included in NFS gateway 
package. Even
-  though NFS gateway works with portmap/rpcbind provide by most Linux 
distributions, the
-  package included portmap is needed on some Linux systems such as REHL6.2 due 
to an 
-  {{{https://bugzilla.redhat.com/show_bug.cgi?id=731542}rpcbind bug}}. More 
detailed discussions can
-  be found in {{{https://issues.apache.org/jira/browse/HDFS-4763}HDFS-4763}}.
-
-   [[1]] Stop nfs/rpcbind/portmap services provided by the platform (commands 
can be different on various Unix platforms):
-      
--------------------------
-     service nfs stop
-      
-     service rpcbind stop
--------------------------
-
-
-   [[2]] Start package included portmap (needs root privileges):
-
--------------------------
-     hdfs portmap
-  
-     OR
-
-     hadoop-daemon.sh start portmap
--------------------------
-
-   [[3]] Start mountd and nfsd.
-   
-     No root privileges are required for this command. In non-secure mode, the 
NFS gateway
-     should be started by the proxy user mentioned at the beginning of this 
user guide. 
-     While in secure mode, any user can start NFS gateway 
-     as long as the user has read access to the Kerberos keytab defined in 
"nfs.keytab.file".
-
--------------------------
-     hdfs nfs3
-
-     OR
-
-     hadoop-daemon.sh start nfs3
--------------------------
-
-     Note, if the hadoop-daemon.sh script starts the NFS gateway, its log can 
be found in the hadoop log folder.
-
-
-   [[4]] Stop NFS gateway services.
-
--------------------------
-      hadoop-daemon.sh stop nfs3
-
-      hadoop-daemon.sh stop portmap
--------------------------
-
-  Optionally, you can forgo running the Hadoop-provided portmap daemon and
-  instead use the system portmap daemon on all operating systems if you start 
the
-  NFS Gateway as root. This will allow the HDFS NFS Gateway to work around the
-  aforementioned bug and still register using the system portmap daemon. To do
-  so, just start the NFS gateway daemon as you normally would, but make sure to
-  do so as the "root" user, and also set the "HADOOP_PRIVILEGED_NFS_USER"
-  environment variable to an unprivileged user. In this mode the NFS Gateway 
will
-  start as root to perform its initial registration with the system portmap, 
and
-  then will drop privileges back to the user specified by the
-  HADOOP_PRIVILEGED_NFS_USER afterward and for the rest of the duration of the
-  lifetime of the NFS Gateway process. Note that if you choose this route, you
-  should skip steps 1 and 2 above.
-
-
-* {Verify validity of NFS related services}
-
-    [[1]] Execute the following command to verify if all the services are up 
and running:
-
--------------------------
-       rpcinfo -p $nfs_server_ip
--------------------------
-
-     You should see output similar to the following:
-
--------------------------
-       program vers proto   port
-
-       100005    1   tcp   4242  mountd
-
-       100005    2   udp   4242  mountd
-
-       100005    2   tcp   4242  mountd
-
-       100000    2   tcp    111  portmapper
-
-       100000    2   udp    111  portmapper
-
-       100005    3   udp   4242  mountd
-
-       100005    1   udp   4242  mountd
-
-       100003    3   tcp   2049  nfs
-
-       100005    3   tcp   4242  mountd
--------------------------
-
-    [[2]]  Verify if the HDFS namespace is exported and can be mounted.
-
--------------------------
-        showmount -e $nfs_server_ip                         
--------------------------
-
-      You should see output similar to the following:
-     
--------------------------
-        Exports list on $nfs_server_ip :
-
-        / (everyone)
--------------------------
-
-
-* {Mount the export â/â}
-
-  Currently NFS v3 only uses TCP as the transportation protocol. 
-  NLM is not supported so mount option "nolock" is needed. It's recommended to 
use
-  hard mount. This is because, even after the client sends all data to 
-  NFS gateway, it may take NFS gateway some extra time to transfer data to 
HDFS 
-  when writes were reorderd by NFS client Kernel.
- 
-  If soft mount has to be used, the user should give it a relatively 
-  long timeout (at least no less than the default timeout on the host) .
-
-  The users can mount the HDFS namespace as shown below:
-
--------------------------------------------------------------------  
-       mount -t nfs -o vers=3,proto=tcp,nolock,noacl $server:/  $mount_point
--------------------------------------------------------------------
-
-  Then the users can access HDFS as part of the local file system except that, 
-  hard link and random write are not supported yet. To optimize the performance
-  of large file I/O, one can increase the NFS transfer size(rsize and wsize) 
during mount.
-  By default, NFS gateway supports 1MB as the maximum transfer size. For 
larger data
-  transfer size, one needs to update "nfs.rtmax" and "nfs.rtmax" in 
hdfs-site.xml.
-
-* {Allow mounts from unprivileged clients}
-
-  In environments where root access on client machines is not generally
-  available, some measure of security can be obtained by ensuring that only NFS
-  clients originating from privileged ports can connect to the NFS server. This
-  feature is referred to as "port monitoring." This feature is not enabled by 
default
-  in the HDFS NFS Gateway, but can be optionally enabled by setting the
-  following config in hdfs-site.xml on the NFS Gateway machine:
-
--------------------------------------------------------------------
-<property>
-  <name>nfs.port.monitoring.disabled</name>
-  <value>false</value>
-</property>
--------------------------------------------------------------------
-
-* {User authentication and mapping}
-
-  NFS gateway in this release uses AUTH_UNIX style authentication. When the 
user on NFS client
-  accesses the mount point, NFS client passes the UID to NFS gateway. 
-  NFS gateway does a lookup to find user name from the UID, and then passes the
-  username to the HDFS along with the HDFS requests.
-  For example, if the NFS client has current user as "admin", when the user 
accesses
-  the mounted directory, NFS gateway will access HDFS as user "admin". To 
access HDFS
-  as the user "hdfs", one needs to switch the current user to "hdfs" on the 
client system
-  when accessing the mounted directory.
-
-  The system administrator must ensure that the user on NFS client host has 
the same
-  name and UID as that on the NFS gateway host. This is usually not a problem 
if
-  the same user management system (e.g., LDAP/NIS) is used to create and 
deploy users on
-  HDFS nodes and NFS client node. In case the user account is created manually 
on different hosts, one might need to 
-  modify UID (e.g., do "usermod -u 123 myusername") on either NFS client or 
NFS gateway host
-  in order to make it the same on both sides. More technical details of RPC 
AUTH_UNIX can be found
-  in {{{http://tools.ietf.org/html/rfc1057}RPC specification}}.
-
-  Optionally, the system administrator can configure a custom static mapping
-  file in the event one wishes to access the HDFS NFS Gateway from a system 
with
-  a completely disparate set of UIDs/GIDs. By default this file is located at
-  "/etc/nfs.map", but a custom location can be configured by setting the
-  "static.id.mapping.file" property to the path of the static mapping file.
-  The format of the static mapping file is similar to what is described in the
-  exports(5) manual page, but roughly it is:
-
--------------------------
-# Mapping for clients accessing the NFS gateway
-uid 10 100 # Map the remote UID 10 the local UID 100
-gid 11 101 # Map the remote GID 11 to the local GID 101
--------------------------


http://git-wip-us.apache.org/repos/asf/hadoop/blob/5c0073e7/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/HdfsPermissionsGuide.apt.vm
----------------------------------------------------------------------
diff --git 
a/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/HdfsPermissionsGuide.apt.vm 
b/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/HdfsPermissionsGuide.apt.vm
deleted file mode 100644
index 30119a6..0000000
--- a/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/HdfsPermissionsGuide.apt.vm
+++ /dev/null
@@ -1,438 +0,0 @@
-~~ Licensed under the Apache License, Version 2.0 (the "License");
-~~ you may not use this file except in compliance with the License.
-~~ You may obtain a copy of the License at
-~~
-~~   http://www.apache.org/licenses/LICENSE-2.0
-~~
-~~ Unless required by applicable law or agreed to in writing, software
-~~ distributed under the License is distributed on an "AS IS" BASIS,
-~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-~~ See the License for the specific language governing permissions and
-~~ limitations under the License. See accompanying LICENSE file.
-
-  ---
-  HDFS Permissions Guide
-  ---
-  ---
-  ${maven.build.timestamp}
-
-HDFS Permissions Guide
-
-%{toc|section=1|fromDepth=0}
-
-* Overview
-
-   The Hadoop Distributed File System (HDFS) implements a permissions
-   model for files and directories that shares much of the POSIX model.
-   Each file and directory is associated with an owner and a group. The
-   file or directory has separate permissions for the user that is the
-   owner, for other users that are members of the group, and for all other
-   users. For files, the r permission is required to read the file, and
-   the w permission is required to write or append to the file. For
-   directories, the r permission is required to list the contents of the
-   directory, the w permission is required to create or delete files or
-   directories, and the x permission is required to access a child of the
-   directory.
-
-   In contrast to the POSIX model, there are no setuid or setgid bits for
-   files as there is no notion of executable files. For directories, there
-   are no setuid or setgid bits directory as a simplification. The Sticky
-   bit can be set on directories, preventing anyone except the superuser,
-   directory owner or file owner from deleting or moving the files within
-   the directory. Setting the sticky bit for a file has no effect.
-   Collectively, the permissions of a file or directory are its mode. In
-   general, Unix customs for representing and displaying modes will be
-   used, including the use of octal numbers in this description. When a
-   file or directory is created, its owner is the user identity of the
-   client process, and its group is the group of the parent directory (the
-   BSD rule).
-
-   HDFS also provides optional support for POSIX ACLs (Access Control Lists) to
-   augment file permissions with finer-grained rules for specific named users 
or
-   named groups.  ACLs are discussed in greater detail later in this document.
-
-   Each client process that accesses HDFS has a two-part identity composed
-   of the user name, and groups list. Whenever HDFS must do a permissions
-   check for a file or directory foo accessed by a client process,
-
-     * If the user name matches the owner of foo, then the owner
-       permissions are tested;
-
-     * Else if the group of foo matches any of member of the groups list,
-       then the group permissions are tested;
-
-     * Otherwise the other permissions of foo are tested.
-
-   If a permissions check fails, the client operation fails.
-
-* User Identity
-
-   As of Hadoop 0.22, Hadoop supports two different modes of operation to
-   determine the user's identity, specified by the
-   hadoop.security.authentication property:
-
-   * <<simple>>
-
-          In this mode of operation, the identity of a client process is
-          determined by the host operating system. On Unix-like systems,
-          the user name is the equivalent of `whoami`.
-
-   * <<kerberos>>
-
-          In Kerberized operation, the identity of a client process is
-          determined by its Kerberos credentials. For example, in a
-          Kerberized environment, a user may use the kinit utility to
-          obtain a Kerberos ticket-granting-ticket (TGT) and use klist to
-          determine their current principal. When mapping a Kerberos
-          principal to an HDFS username, all components except for the
-          primary are dropped. For example, a principal
-          todd/[email protected] will act as the simple username
-          todd on HDFS.
-
-   Regardless of the mode of operation, the user identity mechanism is
-   extrinsic to HDFS itself. There is no provision within HDFS for
-   creating user identities, establishing groups, or processing user
-   credentials.
-
-* Group Mapping
-
-   Once a username has been determined as described above, the list of
-   groups is determined by a group mapping service, configured by the
-   hadoop.security.group.mapping property. The default implementation,
-   org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback,
-   will determine if the Java Native Interface (JNI) is available.  If
-   JNI is available, the implementation will use the API within hadoop
-   to resolve a list of groups for a user. If JNI is not available
-   then the shell implementation,
-   org.apache.hadoop.security.ShellBasedUnixGroupsMapping, is used.
-   This implementation shells out with the <<<bash -c groups>>>
-   command (for a Linux/Unix environment) or the <<<net group>>>
-   command (for a Windows environment) to resolve a list of groups for
-   a user.
-
-   An alternate implementation, which connects directly to an LDAP server
-   to resolve the list of groups, is available via
-   org.apache.hadoop.security.LdapGroupsMapping. However, this provider
-   should only be used if the required groups reside exclusively in LDAP,
-   and are not materialized on the Unix servers. More information on
-   configuring the group mapping service is available in the Javadocs.
-
-   For HDFS, the mapping of users to groups is performed on the NameNode.
-   Thus, the host system configuration of the NameNode determines the
-   group mappings for the users.
-
-   Note that HDFS stores the user and group of a file or directory as
-   strings; there is no conversion from user and group identity numbers as
-   is conventional in Unix.
-
-* Understanding the Implementation
-
-   Each file or directory operation passes the full path name to the name
-   node, and the permissions checks are applied along the path for each
-   operation. The client framework will implicitly associate the user
-   identity with the connection to the name node, reducing the need for
-   changes to the existing client API. It has always been the case that
-   when one operation on a file succeeds, the operation might fail when
-   repeated because the file, or some directory on the path, no longer
-   exists. For instance, when the client first begins reading a file, it
-   makes a first request to the name node to discover the location of the
-   first blocks of the file. A second request made to find additional
-   blocks may fail. On the other hand, deleting a file does not revoke
-   access by a client that already knows the blocks of the file. With the
-   addition of permissions, a client's access to a file may be withdrawn
-   between requests. Again, changing permissions does not revoke the
-   access of a client that already knows the file's blocks.
-
-* Changes to the File System API
-
-   All methods that use a path parameter will throw 
<<<AccessControlException>>>
-   if permission checking fails.
-
-   New methods:
-
-     * <<<public FSDataOutputStream create(Path f, FsPermission permission,
-       boolean overwrite, int bufferSize, short replication, long
-       blockSize, Progressable progress) throws IOException;>>>
-
-     * <<<public boolean mkdirs(Path f, FsPermission permission) throws
-       IOException;>>>
-
-     * <<<public void setPermission(Path p, FsPermission permission) throws
-       IOException;>>>
-
-     * <<<public void setOwner(Path p, String username, String groupname)
-       throws IOException;>>>
-
-     * <<<public FileStatus getFileStatus(Path f) throws IOException;>>>
-     
-       will additionally return the user, group and mode associated with the
-       path.
-
-   The mode of a new file or directory is restricted my the umask set as a
-   configuration parameter. When the existing <<<create(path, â¦)>>> method
-   (without the permission parameter) is used, the mode of the new file is
-   <<<0666 & ^umask>>>. When the new <<<create(path, permission, â¦)>>> method
-   (with the permission parameter P) is used, the mode of the new file is
-   <<<P & ^umask & 0666>>>. When a new directory is created with the existing
-   <<<mkdirs(path)>>>
-   method (without the permission parameter), the mode of the new
-   directory is <<<0777 & ^umask>>>. When the new <<<mkdirs(path, 
permission)>>>
-   method (with the permission parameter P) is used, the mode of new
-   directory is <<<P & ^umask & 0777>>>.
-
-* Changes to the Application Shell
-
-   New operations:
-
-     * <<<chmod [-R] mode file â¦>>>
-
-       Only the owner of a file or the super-user is permitted to change
-       the mode of a file.
-
-     * <<<chgrp [-R] group file â¦>>>
-
-       The user invoking chgrp must belong to the specified group and be
-       the owner of the file, or be the super-user.
-
-     * <<<chown [-R] [owner][:[group]] file â¦>>>
-
-       The owner of a file may only be altered by a super-user.
-
-     * <<<ls file â¦>>>
-
-     * <<<lsr file â¦>>>
-
-       The output is reformatted to display the owner, group and mode.
-
-* The Super-User
-
-   The super-user is the user with the same identity as name node process
-   itself. Loosely, if you started the name node, then you are the
-   super-user. The super-user can do anything in that permissions checks
-   never fail for the super-user. There is no persistent notion of who was
-   the super-user; when the name node is started the process identity
-   determines who is the super-user for now. The HDFS super-user does not
-   have to be the super-user of the name node host, nor is it necessary
-   that all clusters have the same super-user. Also, an experimenter
-   running HDFS on a personal workstation, conveniently becomes that
-   installation's super-user without any configuration.
-
-   In addition, the administrator my identify a distinguished group using
-   a configuration parameter. If set, members of this group are also
-   super-users.
-
-* The Web Server
-
-   By default, the identity of the web server is a configuration
-   parameter. That is, the name node has no notion of the identity of the
-   real user, but the web server behaves as if it has the identity (user
-   and groups) of a user chosen by the administrator. Unless the chosen
-   identity matches the super-user, parts of the name space may be
-   inaccessible to the web server.
-
-* ACLs (Access Control Lists)
-
-   In addition to the traditional POSIX permissions model, HDFS also supports
-   POSIX ACLs (Access Control Lists).  ACLs are useful for implementing
-   permission requirements that differ from the natural organizational 
hierarchy
-   of users and groups.  An ACL provides a way to set different permissions for
-   specific named users or named groups, not only the file's owner and the
-   file's group.
-
-   By default, support for ACLs is disabled, and the NameNode disallows 
creation
-   of ACLs.  To enable support for ACLs, set <<<dfs.namenode.acls.enabled>>> to
-   true in the NameNode configuration.
-
-   An ACL consists of a set of ACL entries.  Each ACL entry names a specific
-   user or group and grants or denies read, write and execute permissions for
-   that specific user or group.  For example:
-
-+--
-   user::rw-
-   user:bruce:rwx                  #effective:r--
-   group::r-x                      #effective:r--
-   group:sales:rwx                 #effective:r--
-   mask::r--
-   other::r--
-+--
-
-   ACL entries consist of a type, an optional name and a permission string.
-   For display purposes, ':' is used as the delimiter between each field.  In
-   this example ACL, the file owner has read-write access, the file group has
-   read-execute access and others have read access.  So far, this is equivalent
-   to setting the file's permission bits to 654.
-
-   Additionally, there are 2 extended ACL entries for the named user bruce and
-   the named group sales, both granted full access.  The mask is a special ACL
-   entry that filters the permissions granted to all named user entries and
-   named group entries, and also the unnamed group entry.  In the example, the
-   mask has only read permissions, and we can see that the effective 
permissions
-   of several ACL entries have been filtered accordingly.
-
-   Every ACL must have a mask.  If the user doesn't supply a mask while setting
-   an ACL, then a mask is inserted automatically by calculating the union of
-   permissions on all entries that would be filtered by the mask.
-
-   Running <<<chmod>>> on a file that has an ACL actually changes the
-   permissions of the mask.  Since the mask acts as a filter, this effectively
-   constrains the permissions of all extended ACL entries instead of changing
-   just the group entry and possibly missing other extended ACL entries.
-
-   The model also differentiates between an "access ACL", which defines the
-   rules to enforce during permission checks, and a "default ACL", which 
defines
-   the ACL entries that new child files or sub-directories receive 
automatically
-   during creation.  For example:
-
-+--
-   user::rwx
-   group::r-x
-   other::r-x
-   default:user::rwx
-   default:user:bruce:rwx          #effective:r-x
-   default:group::r-x
-   default:group:sales:rwx         #effective:r-x
-   default:mask::r-x
-   default:other::r-x
-+--
-
-   Only directories may have a default ACL.  When a new file or sub-directory 
is
-   created, it automatically copies the default ACL of its parent into its own
-   access ACL.  A new sub-directory also copies it to its own default ACL.  In
-   this way, the default ACL will be copied down through arbitrarily deep 
levels
-   of the file system tree as new sub-directories get created.
-
-   The exact permission values in the new child's access ACL are subject to
-   filtering by the mode parameter.  Considering the default umask of 022, this
-   is typically 755 for new directories and 644 for new files.  The mode
-   parameter filters the copied permission values for the unnamed user (file
-   owner), the mask and other.  Using this particular example ACL, and creating
-   a new sub-directory with 755 for the mode, this mode filtering has no effect
-   on the final result.  However, if we consider creation of a file with 644 
for
-   the mode, then mode filtering causes the new file's ACL to receive 
read-write
-   for the unnamed user (file owner), read for the mask and read for others.
-   This mask also means that effective permissions for named user bruce and
-   named group sales are only read.
-
-   Note that the copy occurs at time of creation of the new file or
-   sub-directory.  Subsequent changes to the parent's default ACL do not change
-   existing children.
-
-   The default ACL must have all minimum required ACL entries, including the
-   unnamed user (file owner), unnamed group (file group) and other entries.  If
-   the user doesn't supply one of these entries while setting a default ACL,
-   then the entries are inserted automatically by copying the corresponding
-   permissions from the access ACL, or permission bits if there is no access
-   ACL.  The default ACL also must have mask.  As described above, if the mask
-   is unspecified, then a mask is inserted automatically by calculating the
-   union of permissions on all entries that would be filtered by the mask.
-
-   When considering a file that has an ACL, the algorithm for permission checks
-   changes to:
-
-     * If the user name matches the owner of file, then the owner
-       permissions are tested;
-
-     * Else if the user name matches the name in one of the named user entries,
-       then these permissions are tested, filtered by the mask permissions;
-
-     * Else if the group of file matches any member of the groups list,
-       and if these permissions filtered by the mask grant access, then these
-       permissions are used;
-
-     * Else if there is a named group entry matching a member of the groups 
list,
-       and if these permissions filtered by the mask grant access, then these
-       permissions are used;
-
-     * Else if the file group or any named group entry matches a member of the
-       groups list, but access was not granted by any of those permissions, 
then
-       access is denied;
-
-     * Otherwise the other permissions of file are tested.
-
-   Best practice is to rely on traditional permission bits to implement most
-   permission requirements, and define a smaller number of ACLs to augment the
-   permission bits with a few exceptional rules.  A file with an ACL incurs an
-   additional cost in memory in the NameNode compared to a file that has only
-   permission bits.
-
-* ACLs File System API
-
-   New methods:
-
-     * <<<public void modifyAclEntries(Path path, List<AclEntry> aclSpec) 
throws
-       IOException;>>>
-
-     * <<<public void removeAclEntries(Path path, List<AclEntry> aclSpec) 
throws
-       IOException;>>>
-
-     * <<<public void public void removeDefaultAcl(Path path) throws
-       IOException;>>>
-
-     * <<<public void removeAcl(Path path) throws IOException;>>>
-
-     * <<<public void setAcl(Path path, List<AclEntry> aclSpec) throws
-       IOException;>>>
-
-     * <<<public AclStatus getAclStatus(Path path) throws IOException;>>>
-
-* ACLs Shell Commands
-
-     * <<<hdfs dfs -getfacl [-R] <path> >>>
-
-       Displays the Access Control Lists (ACLs) of files and directories. If a
-       directory has a default ACL, then getfacl also displays the default ACL.
-
-     * <<<hdfs dfs -setfacl [-R] [{-b|-k} {-m|-x <acl_spec>} <path>]|[--set 
<acl_spec> <path>] >>>
-
-       Sets Access Control Lists (ACLs) of files and directories.
-
-     * <<<hdfs dfs -ls <args> >>>
-
-       The output of <<<ls>>> will append a '+' character to the permissions
-       string of any file or directory that has an ACL.
-
-       See the {{{../hadoop-common/FileSystemShell.html}File System Shell}}
-       documentation for full coverage of these commands.
-
-* Configuration Parameters
-
-     * <<<dfs.permissions.enabled = true>>>
-
-       If yes use the permissions system as described here. If no,
-       permission checking is turned off, but all other behavior is
-       unchanged. Switching from one parameter value to the other does not
-       change the mode, owner or group of files or directories.
-       Regardless of whether permissions are on or off, chmod, chgrp, chown and
-       setfacl always check permissions. These functions are only useful in
-       the permissions context, and so there is no backwards compatibility
-       issue. Furthermore, this allows administrators to reliably set
-       owners and permissions in advance of turning on regular permissions
-       checking.
-
-     * <<<dfs.web.ugi = webuser,webgroup>>>
-
-       The user name to be used by the web server. Setting this to the
-       name of the super-user allows any web client to see everything.
-       Changing this to an otherwise unused identity allows web clients to
-       see only those things visible using "other" permissions. Additional
-       groups may be added to the comma-separated list.
-
-     * <<<dfs.permissions.superusergroup = supergroup>>>
-
-       The name of the group of super-users.
-
-     * <<<fs.permissions.umask-mode = 0022>>>
-
-       The umask used when creating files and directories. For
-       configuration files, the decimal value 18 may be used.
-
-     * <<<dfs.cluster.administrators = ACL-for-admins>>>
-
-       The administrators for the cluster specified as an ACL. This
-       controls who can access the default servlets, etc. in the HDFS.
-
-     * <<<dfs.namenode.acls.enabled = true>>>
-
-       Set to true to enable support for HDFS ACLs (Access Control Lists).  By
-       default, ACLs are disabled.  When ACLs are disabled, the NameNode 
rejects
-       all attempts to set an ACL.

http://git-wip-us.apache.org/repos/asf/hadoop/blob/5c0073e7/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/HdfsQuotaAdminGuide.apt.vm
----------------------------------------------------------------------
diff --git 
a/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/HdfsQuotaAdminGuide.apt.vm 
b/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/HdfsQuotaAdminGuide.apt.vm
deleted file mode 100644
index 0821946..0000000
--- a/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/HdfsQuotaAdminGuide.apt.vm
+++ /dev/null
@@ -1,116 +0,0 @@
-~~ Licensed under the Apache License, Version 2.0 (the "License");
-~~ you may not use this file except in compliance with the License.
-~~ You may obtain a copy of the License at
-~~
-~~   http://www.apache.org/licenses/LICENSE-2.0
-~~
-~~ Unless required by applicable law or agreed to in writing, software
-~~ distributed under the License is distributed on an "AS IS" BASIS,
-~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-~~ See the License for the specific language governing permissions and
-~~ limitations under the License. See accompanying LICENSE file.
-
-  ---
-  HDFS Quotas Guide
-  ---
-  ---
-  ${maven.build.timestamp}
-
-HDFS Quotas Guide
-
-%{toc|section=1|fromDepth=0}
-
-* Overview
-
-   The Hadoop Distributed File System (HDFS) allows the administrator to
-   set quotas for the number of names used and the amount of space used
-   for individual directories. Name quotas and space quotas operate
-   independently, but the administration and implementation of the two
-   types of quotas are closely parallel.
-
-* Name Quotas
-
-   The name quota is a hard limit on the number of file and directory
-   names in the tree rooted at that directory. File and directory
-   creations fail if the quota would be exceeded. Quotas stick with
-   renamed directories; the rename operation fails if operation would
-   result in a quota violation. The attempt to set a quota will still
-   succeed even if the directory would be in violation of the new quota. A
-   newly created directory has no associated quota. The largest quota is
-   Long.Max_Value. A quota of one forces a directory to remain empty.
-   (Yes, a directory counts against its own quota!)
-
-   Quotas are persistent with the fsimage. When starting, if the fsimage
-   is immediately in violation of a quota (perhaps the fsimage was
-   surreptitiously modified), a warning is printed for each of such
-   violations. Setting or removing a quota creates a journal entry.
-
-* Space Quotas
-
-   The space quota is a hard limit on the number of bytes used by files in
-   the tree rooted at that directory. Block allocations fail if the quota
-   would not allow a full block to be written. Each replica of a block
-   counts against the quota. Quotas stick with renamed directories; the
-   rename operation fails if the operation would result in a quota
-   violation. A newly created directory has no associated quota. The
-   largest quota is <<<Long.Max_Value>>>. A quota of zero still permits files
-   to be created, but no blocks can be added to the files. Directories don't
-   use host file system space and don't count against the space quota. The
-   host file system space used to save the file meta data is not counted
-   against the quota. Quotas are charged at the intended replication
-   factor for the file; changing the replication factor for a file will
-   credit or debit quotas.
-
-   Quotas are persistent with the fsimage. When starting, if the fsimage
-   is immediately in violation of a quota (perhaps the fsimage was
-   surreptitiously modified), a warning is printed for each of such
-   violations. Setting or removing a quota creates a journal entry.
-
-* Administrative Commands
-
-   Quotas are managed by a set of commands available only to the
-   administrator.
-
-     * <<<dfsadmin -setQuota <N> <directory>...<directory> >>>
-
-       Set the name quota to be N for each directory. Best effort for each
-       directory, with faults reported if N is not a positive long
-       integer, the directory does not exist or it is a file, or the
-       directory would immediately exceed the new quota.
-
-     * <<<dfsadmin -clrQuota <directory>...<directory> >>>
-
-       Remove any name quota for each directory. Best effort for each
-       directory, with faults reported if the directory does not exist or
-       it is a file. It is not a fault if the directory has no quota.
-
-     * <<<dfsadmin -setSpaceQuota <N> <directory>...<directory> >>>
-
-       Set the space quota to be N bytes for each directory. This is a
-       hard limit on total size of all the files under the directory tree.
-       The space quota takes replication also into account, i.e. one GB of
-       data with replication of 3 consumes 3GB of quota. N can also be
-       specified with a binary prefix for convenience, for e.g. 50g for 50
-       gigabytes and 2t for 2 terabytes etc. Best effort for each
-       directory, with faults reported if N is neither zero nor a positive
-       integer, the directory does not exist or it is a file, or the
-       directory would immediately exceed the new quota.
-
-     * <<<dfsadmin -clrSpaceQuota <directory>...<director> >>>
-
-       Remove any space quota for each directory. Best effort for each
-       directory, with faults reported if the directory does not exist or
-       it is a file. It is not a fault if the directory has no quota.
-
-* Reporting Command
-
-   An an extension to the count command of the HDFS shell reports quota
-   values and the current count of names and bytes in use.
-
-     * <<<fs -count -q <directory>...<directory> >>>
-
-       With the -q option, also report the name quota value set for each
-       directory, the available name quota remaining, the space quota
-       value set, and the available space quota remaining. If the
-       directory does not have a quota set, the reported values are <<<none>>>
-       and <<<inf>>>.

http://git-wip-us.apache.org/repos/asf/hadoop/blob/5c0073e7/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/HdfsUserGuide.apt.vm
----------------------------------------------------------------------
diff --git a/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/HdfsUserGuide.apt.vm 
b/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/HdfsUserGuide.apt.vm
deleted file mode 100644
index 25a466e..0000000
--- a/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/HdfsUserGuide.apt.vm
+++ /dev/null
@@ -1,556 +0,0 @@
-~~ Licensed under the Apache License, Version 2.0 (the "License");
-~~ you may not use this file except in compliance with the License.
-~~ You may obtain a copy of the License at
-~~
-~~   http://www.apache.org/licenses/LICENSE-2.0
-~~
-~~ Unless required by applicable law or agreed to in writing, software
-~~ distributed under the License is distributed on an "AS IS" BASIS,
-~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-~~ See the License for the specific language governing permissions and
-~~ limitations under the License. See accompanying LICENSE file.
-
-  ---
-  HDFS Users Guide
-  ---
-  ---
-  ${maven.build.timestamp}
-
-HDFS Users Guide
-
-%{toc|section=1|fromDepth=0}
-
-* Purpose
-
-   This document is a starting point for users working with Hadoop
-   Distributed File System (HDFS) either as a part of a Hadoop cluster or
-   as a stand-alone general purpose distributed file system. While HDFS is
-   designed to "just work" in many environments, a working knowledge of
-   HDFS helps greatly with configuration improvements and diagnostics on a
-   specific cluster.
-
-* Overview
-
-   HDFS is the primary distributed storage used by Hadoop applications. A
-   HDFS cluster primarily consists of a NameNode that manages the file
-   system metadata and DataNodes that store the actual data. The HDFS
-   Architecture Guide describes HDFS in detail. This user guide primarily
-   deals with the interaction of users and administrators with HDFS
-   clusters. The HDFS architecture diagram depicts basic interactions
-   among NameNode, the DataNodes, and the clients. Clients contact
-   NameNode for file metadata or file modifications and perform actual
-   file I/O directly with the DataNodes.
-
-   The following are some of the salient features that could be of
-   interest to many users.
-
-     * Hadoop, including HDFS, is well suited for distributed storage and
-       distributed processing using commodity hardware. It is fault
-       tolerant, scalable, and extremely simple to expand. MapReduce, well
-       known for its simplicity and applicability for large set of
-       distributed applications, is an integral part of Hadoop.
-
-     * HDFS is highly configurable with a default configuration well
-       suited for many installations. Most of the time, configuration
-       needs to be tuned only for very large clusters.
-
-     * Hadoop is written in Java and is supported on all major platforms.
-
-     * Hadoop supports shell-like commands to interact with HDFS directly.
-
-     * The NameNode and Datanodes have built in web servers that makes it
-       easy to check current status of the cluster.
-
-     * New features and improvements are regularly implemented in HDFS.
-       The following is a subset of useful features in HDFS:
-
-          * File permissions and authentication.
-
-          * Rack awareness: to take a node's physical location into
-            account while scheduling tasks and allocating storage.
-
-          * Safemode: an administrative mode for maintenance.
-
-          * <<<fsck>>>: a utility to diagnose health of the file system, to 
find
-            missing files or blocks.
-
-          * <<<fetchdt>>>: a utility to fetch DelegationToken and store it in a
-            file on the local system.
-
-          * Balancer: tool to balance the cluster when the data is
-            unevenly distributed among DataNodes.
-
-          * Upgrade and rollback: after a software upgrade, it is possible
-            to rollback to HDFS' state before the upgrade in case of
-            unexpected problems.
-
-          * Secondary NameNode: performs periodic checkpoints of the
-            namespace and helps keep the size of file containing log of
-            HDFS modifications within certain limits at the NameNode.
-
-          * Checkpoint node: performs periodic checkpoints of the
-            namespace and helps minimize the size of the log stored at the
-            NameNode containing changes to the HDFS. Replaces the role
-            previously filled by the Secondary NameNode, though is not yet
-            battle hardened. The NameNode allows multiple Checkpoint nodes
-            simultaneously, as long as there are no Backup nodes
-            registered with the system.
-
-          * Backup node: An extension to the Checkpoint node. In addition
-            to checkpointing it also receives a stream of edits from the
-            NameNode and maintains its own in-memory copy of the
-            namespace, which is always in sync with the active NameNode
-            namespace state. Only one Backup node may be registered with
-            the NameNode at once.
-
-* Prerequisites
-
-   The following documents describe how to install and set up a Hadoop
-   cluster:
-
-     * {{{../hadoop-common/SingleCluster.html}Single Node Setup}}
-       for first-time users.
-
-     * {{{../hadoop-common/ClusterSetup.html}Cluster Setup}}
-       for large, distributed clusters.
-
-   The rest of this document assumes the user is able to set up and run a
-   HDFS with at least one DataNode. For the purpose of this document, both
-   the NameNode and DataNode could be running on the same physical
-   machine.
-
-* Web Interface
-
-   NameNode and DataNode each run an internal web server in order to
-   display basic information about the current status of the cluster. With
-   the default configuration, the NameNode front page is at
-   <<<http://namenode-name:50070/>>>. It lists the DataNodes in the cluster and
-   basic statistics of the cluster. The web interface can also be used to
-   browse the file system (using "Browse the file system" link on the
-   NameNode front page).
-
-* Shell Commands
-
-   Hadoop includes various shell-like commands that directly interact with
-   HDFS and other file systems that Hadoop supports. The command <<<bin/hdfs 
dfs -help>>>
-   lists the commands supported by Hadoop shell. Furthermore,
-   the command <<<bin/hdfs dfs -help command-name>>> displays more detailed 
help
-   for a command. These commands support most of the normal files system
-   operations like copying files, changing file permissions, etc. It also
-   supports a few HDFS specific operations like changing replication of
-   files. For more information see {{{../hadoop-common/FileSystemShell.html}
-   File System Shell Guide}}.
-
-**  DFSAdmin Command
-
-   The <<<bin/hdfs dfsadmin>>> command supports a few HDFS administration
-   related operations. The <<<bin/hdfs dfsadmin -help>>> command lists all the
-   commands currently supported. For e.g.:
-
-     * <<<-report>>>: reports basic statistics of HDFS. Some of this
-       information is also available on the NameNode front page.
-
-     * <<<-safemode>>>: though usually not required, an administrator can
-       manually enter or leave Safemode.
-
-     * <<<-finalizeUpgrade>>>: removes previous backup of the cluster made
-       during last upgrade.
-
-     * <<<-refreshNodes>>>: Updates the namenode with the set of datanodes
-       allowed to connect to the namenode. Namenodes re-read datanode
-       hostnames in the file defined by <<<dfs.hosts>>>, 
<<<dfs.hosts.exclude>>>.
-       Hosts defined in <<<dfs.hosts>>> are the datanodes that are part of the
-       cluster. If there are entries in <<<dfs.hosts>>>, only the hosts in it
-       are allowed to register with the namenode. Entries in
-       <<<dfs.hosts.exclude>>> are datanodes that need to be decommissioned.
-       Datanodes complete decommissioning when all the replicas from them
-       are replicated to other datanodes. Decommissioned nodes are not
-       automatically shutdown and are not chosen for writing for new
-       replicas.
-
-     * <<<-printTopology>>> : Print the topology of the cluster. Display a tree
-       of racks and datanodes attached to the tracks as viewed by the
-       NameNode.
-
-   For command usage, see {{{./HDFSCommands.html#dfsadmin}dfsadmin}}.
-
-* Secondary NameNode
-
-   The NameNode stores modifications to the file system as a log appended
-   to a native file system file, edits. When a NameNode starts up, it
-   reads HDFS state from an image file, fsimage, and then applies edits
-   from the edits log file. It then writes new HDFS state to the fsimage
-   and starts normal operation with an empty edits file. Since NameNode
-   merges fsimage and edits files only during start up, the edits log file
-   could get very large over time on a busy cluster. Another side effect
-   of a larger edits file is that next restart of NameNode takes longer.
-
-   The secondary NameNode merges the fsimage and the edits log files
-   periodically and keeps edits log size within a limit. It is usually run
-   on a different machine than the primary NameNode since its memory
-   requirements are on the same order as the primary NameNode.
-
-   The start of the checkpoint process on the secondary NameNode is
-   controlled by two configuration parameters.
-
-     * <<<dfs.namenode.checkpoint.period>>>, set to 1 hour by default, 
specifies
-       the maximum delay between two consecutive checkpoints, and
-
-     * <<<dfs.namenode.checkpoint.txns>>>, set to 1 million by default, 
defines the
-       number of uncheckpointed transactions on the NameNode which will
-       force an urgent checkpoint, even if the checkpoint period has not
-       been reached.
-
-   The secondary NameNode stores the latest checkpoint in a directory
-   which is structured the same way as the primary NameNode's directory.
-   So that the check pointed image is always ready to be read by the
-   primary NameNode if necessary.
-
-   For command usage,
-   see {{{./HDFSCommands.html#secondarynamenode}secondarynamenode}}.
-
-* Checkpoint Node
-
-   NameNode persists its namespace using two files: fsimage, which is the
-   latest checkpoint of the namespace and edits, a journal (log) of
-   changes to the namespace since the checkpoint. When a NameNode starts
-   up, it merges the fsimage and edits journal to provide an up-to-date
-   view of the file system metadata. The NameNode then overwrites fsimage
-   with the new HDFS state and begins a new edits journal.
-
-   The Checkpoint node periodically creates checkpoints of the namespace.
-   It downloads fsimage and edits from the active NameNode, merges them
-   locally, and uploads the new image back to the active NameNode. The
-   Checkpoint node usually runs on a different machine than the NameNode
-   since its memory requirements are on the same order as the NameNode.
-   The Checkpoint node is started by bin/hdfs namenode -checkpoint on the
-   node specified in the configuration file.
-
-   The location of the Checkpoint (or Backup) node and its accompanying
-   web interface are configured via the <<<dfs.namenode.backup.address>>> and
-   <<<dfs.namenode.backup.http-address>>> configuration variables.
-
-   The start of the checkpoint process on the Checkpoint node is
-   controlled by two configuration parameters.
-
-     * <<<dfs.namenode.checkpoint.period>>>, set to 1 hour by default, 
specifies
-       the maximum delay between two consecutive checkpoints
-
-     * <<<dfs.namenode.checkpoint.txns>>>, set to 1 million by default, 
defines the
-       number of uncheckpointed transactions on the NameNode which will
-       force an urgent checkpoint, even if the checkpoint period has not
-       been reached.
-
-   The Checkpoint node stores the latest checkpoint in a directory that is
-   structured the same as the NameNode's directory. This allows the
-   checkpointed image to be always available for reading by the NameNode
-   if necessary. See Import checkpoint.
-
-   Multiple checkpoint nodes may be specified in the cluster configuration
-   file.
-
-   For command usage, see {{{./HDFSCommands.html#namenode}namenode}}.
-
-* Backup Node
-
-   The Backup node provides the same checkpointing functionality as the
-   Checkpoint node, as well as maintaining an in-memory, up-to-date copy
-   of the file system namespace that is always synchronized with the
-   active NameNode state. Along with accepting a journal stream of file
-   system edits from the NameNode and persisting this to disk, the Backup
-   node also applies those edits into its own copy of the namespace in
-   memory, thus creating a backup of the namespace.
-
-   The Backup node does not need to download fsimage and edits files from
-   the active NameNode in order to create a checkpoint, as would be
-   required with a Checkpoint node or Secondary NameNode, since it already
-   has an up-to-date state of the namespace state in memory. The Backup
-   node checkpoint process is more efficient as it only needs to save the
-   namespace into the local fsimage file and reset edits.
-
-   As the Backup node maintains a copy of the namespace in memory, its RAM
-   requirements are the same as the NameNode.
-
-   The NameNode supports one Backup node at a time. No Checkpoint nodes
-   may be registered if a Backup node is in use. Using multiple Backup
-   nodes concurrently will be supported in the future.
-
-   The Backup node is configured in the same manner as the Checkpoint
-   node. It is started with <<<bin/hdfs namenode -backup>>>.
-
-   The location of the Backup (or Checkpoint) node and its accompanying
-   web interface are configured via the <<<dfs.namenode.backup.address>>> and
-   <<<dfs.namenode.backup.http-address>>> configuration variables.
-
-   Use of a Backup node provides the option of running the NameNode with
-   no persistent storage, delegating all responsibility for persisting the
-   state of the namespace to the Backup node. To do this, start the
-   NameNode with the <<<-importCheckpoint>>> option, along with specifying no
-   persistent storage directories of type edits <<<dfs.namenode.edits.dir>>> 
for
-   the NameNode configuration.
-
-   For a complete discussion of the motivation behind the creation of the
-   Backup node and Checkpoint node, see 
{{{https://issues.apache.org/jira/browse/HADOOP-4539}HADOOP-4539}}.
-   For command usage, see {{{./HDFSCommands.html#namenode}namenode}}.
-
-* Import Checkpoint
-
-   The latest checkpoint can be imported to the NameNode if all other
-   copies of the image and the edits files are lost. In order to do that
-   one should:
-
-     * Create an empty directory specified in the <<<dfs.namenode.name.dir>>>
-       configuration variable;
-
-     * Specify the location of the checkpoint directory in the
-       configuration variable <<<dfs.namenode.checkpoint.dir>>>;
-
-     * and start the NameNode with <<<-importCheckpoint>>> option.
-
-   The NameNode will upload the checkpoint from the
-   <<<dfs.namenode.checkpoint.dir>>> directory and then save it to the NameNode
-   directory(s) set in <<<dfs.namenode.name.dir>>>. The NameNode will fail if a
-   legal image is contained in <<<dfs.namenode.name.dir>>>. The NameNode
-   verifies that the image in <<<dfs.namenode.checkpoint.dir>>> is consistent,
-   but does not modify it in any way.
-
-   For command usage, see {{{./HDFSCommands.html#namenode}namenode}}.
-
-* Balancer
-
-   HDFS data might not always be be placed uniformly across the DataNode.
-   One common reason is addition of new DataNodes to an existing cluster.
-   While placing new blocks (data for a file is stored as a series of
-   blocks), NameNode considers various parameters before choosing the
-   DataNodes to receive these blocks. Some of the considerations are:
-
-     * Policy to keep one of the replicas of a block on the same node as
-       the node that is writing the block.
-
-     * Need to spread different replicas of a block across the racks so
-       that cluster can survive loss of whole rack.
-
-     * One of the replicas is usually placed on the same rack as the node
-       writing to the file so that cross-rack network I/O is reduced.
-
-     * Spread HDFS data uniformly across the DataNodes in the cluster.
-
-   Due to multiple competing considerations, data might not be uniformly
-   placed across the DataNodes. HDFS provides a tool for administrators
-   that analyzes block placement and rebalanaces data across the DataNode.
-   A brief administrator's guide for balancer is available at 
-   {{{https://issues.apache.org/jira/browse/HADOOP-1652}HADOOP-1652}}.
-
-   For command usage, see {{{./HDFSCommands.html#balancer}balancer}}.
-
-* Rack Awareness
-
-   Typically large Hadoop clusters are arranged in racks and network
-   traffic between different nodes with in the same rack is much more
-   desirable than network traffic across the racks. In addition NameNode
-   tries to place replicas of block on multiple racks for improved fault
-   tolerance. Hadoop lets the cluster administrators decide which rack a
-   node belongs to through configuration variable
-   <<<net.topology.script.file.name>>>. When this script is configured, each
-   node runs the script to determine its rack id. A default installation
-   assumes all the nodes belong to the same rack. This feature and
-   configuration is further described in PDF attached to
-   {{{https://issues.apache.org/jira/browse/HADOOP-692}HADOOP-692}}.
-
-* Safemode
-
-   During start up the NameNode loads the file system state from the
-   fsimage and the edits log file. It then waits for DataNodes to report
-   their blocks so that it does not prematurely start replicating the
-   blocks though enough replicas already exist in the cluster. During this
-   time NameNode stays in Safemode. Safemode for the NameNode is
-   essentially a read-only mode for the HDFS cluster, where it does not
-   allow any modifications to file system or blocks. Normally the NameNode
-   leaves Safemode automatically after the DataNodes have reported that
-   most file system blocks are available. If required, HDFS could be
-   placed in Safemode explicitly using <<<bin/hdfs dfsadmin -safemode>>>
-   command. NameNode front page shows whether Safemode is on or off. A
-   more detailed description and configuration is maintained as JavaDoc
-   for <<<setSafeMode()>>>.
-
-* fsck
-
-   HDFS supports the fsck command to check for various inconsistencies. It
-   it is designed for reporting problems with various files, for example,
-   missing blocks for a file or under-replicated blocks. Unlike a
-   traditional fsck utility for native file systems, this command does not
-   correct the errors it detects. Normally NameNode automatically corrects
-   most of the recoverable failures. By default fsck ignores open files
-   but provides an option to select all files during reporting. The HDFS
-   fsck command is not a Hadoop shell command. It can be run as
-   <<<bin/hdfs fsck>>>. For command usage, see
-   {{{./HDFSCommands.html#fsck}fsck}}. fsck can be run on
-   the whole file system or on a subset of files.
-
-* fetchdt
-
-   HDFS supports the fetchdt command to fetch Delegation Token and store
-   it in a file on the local system. This token can be later used to
-   access secure server (NameNode for example) from a non secure client.
-   Utility uses either RPC or HTTPS (over Kerberos) to get the token, and
-   thus requires kerberos tickets to be present before the run (run kinit
-   to get the tickets). The HDFS fetchdt command is not a Hadoop shell
-   command. It can be run as <<<bin/hdfs fetchdt DTfile>>>. After you got
-   the token you can run an HDFS command without having Kerberos tickets,
-   by pointing <<<HADOOP_TOKEN_FILE_LOCATION>>> environmental variable to the
-   delegation token file. For command usage, see
-   {{{./HDFSCommands.html#fetchdt}fetchdt}} command.
-
-* Recovery Mode
-
-   Typically, you will configure multiple metadata storage locations.
-   Then, if one storage location is corrupt, you can read the metadata
-   from one of the other storage locations.
-
-   However, what can you do if the only storage locations available are
-   corrupt? In this case, there is a special NameNode startup mode called
-   Recovery mode that may allow you to recover most of your data.
-
-   You can start the NameNode in recovery mode like so: <<<namenode -recover>>>
-
-   When in recovery mode, the NameNode will interactively prompt you at
-   the command line about possible courses of action you can take to
-   recover your data.
-
-   If you don't want to be prompted, you can give the <<<-force>>> option. This
-   option will force recovery mode to always select the first choice.
-   Normally, this will be the most reasonable choice.
-
-   Because Recovery mode can cause you to lose data, you should always
-   back up your edit log and fsimage before using it.
-
-* Upgrade and Rollback
-
-   When Hadoop is upgraded on an existing cluster, as with any software
-   upgrade, it is possible there are new bugs or incompatible changes that
-   affect existing applications and were not discovered earlier. In any
-   non-trivial HDFS installation, it is not an option to loose any data,
-   let alone to restart HDFS from scratch. HDFS allows administrators to
-   go back to earlier version of Hadoop and rollback the cluster to the
-   state it was in before the upgrade. HDFS upgrade is described in more
-   detail in {{{http://wiki.apache.org/hadoop/Hadoop_Upgrade}Hadoop Upgrade}}
-   Wiki page. HDFS can have one such backup at a time. Before upgrading,
-   administrators need to remove existing backup using bin/hadoop dfsadmin
-   <<<-finalizeUpgrade>>> command. The following briefly describes the
-   typical upgrade procedure:
-
-     * Before upgrading Hadoop software, finalize if there an existing
-       backup. <<<dfsadmin -upgradeProgress>>> status can tell if the cluster
-       needs to be finalized.
-
-     * Stop the cluster and distribute new version of Hadoop.
-
-     * Run the new version with <<<-upgrade>>> option (<<<bin/start-dfs.sh 
-upgrade>>>).
-
-     * Most of the time, cluster works just fine. Once the new HDFS is
-       considered working well (may be after a few days of operation),
-       finalize the upgrade. Note that until the cluster is finalized,
-       deleting the files that existed before the upgrade does not free up
-       real disk space on the DataNodes.
-
-     * If there is a need to move back to the old version,
-
-          * stop the cluster and distribute earlier version of Hadoop.
-
-          * start the cluster with rollback option. (<<<bin/start-dfs.sh 
-rollback>>>).
-
-    When upgrading to a new version of HDFS, it is necessary to rename or
-    delete any paths that are reserved in the new version of HDFS. If the
-    NameNode encounters a reserved path during upgrade, it will print an
-    error like the following:
-
-    <<< /.reserved is a reserved path and .snapshot is a
-    reserved path component in this version of HDFS. Please rollback and delete
-    or rename this path, or upgrade with the -renameReserved [key-value pairs]
-    option to automatically rename these paths during upgrade.>>>
-
-    Specifying <<<-upgrade -renameReserved [optional key-value pairs]>>> causes
-    the NameNode to automatically rename any reserved paths found during
-    startup. For example, to rename all paths named <<<.snapshot>>> to
-    <<<.my-snapshot>>> and <<<.reserved>>> to <<<.my-reserved>>>, a user would
-    specify <<<-upgrade -renameReserved
-    .snapshot=.my-snapshot,.reserved=.my-reserved>>>.
-
-    If no key-value pairs are specified with <<<-renameReserved>>>, the
-    NameNode will then suffix reserved paths with
-    <<<.<LAYOUT-VERSION>.UPGRADE_RENAMED>>>, e.g.
-    <<<.snapshot.-51.UPGRADE_RENAMED>>>.
-
-    There are some caveats to this renaming process. It's recommended,
-    if possible, to first <<<hdfs dfsadmin -saveNamespace>>> before upgrading.
-    This is because data inconsistency can result if an edit log operation
-    refers to the destination of an automatically renamed file.
-
-* DataNode Hot Swap Drive
-
-   Datanode supports hot swappable drives. The user can add or replace HDFS 
data
-   volumes without shutting down the DataNode. The following briefly describes
-   the typical hot swapping drive procedure:
-
-     * If there are new storage directories, the user should format them and 
mount them
-       appropriately.
-
-     * The user updates the DataNode configuration <<<dfs.datanode.data.dir>>>
-       to reflect the data volume directories that will be actively in use.
-
-     * The user runs <<<dfsadmin -reconfig datanode HOST:PORT start>>> to start
-       the reconfiguration process. The user can use <<<dfsadmin -reconfig
-       datanode HOST:PORT status>>> to query the running status of the 
reconfiguration
-       task.
-
-     * Once the reconfiguration task has completed, the user can safely 
<<<umount>>>
-       the removed data volume directories and physically remove the disks.
-
-* File Permissions and Security
-
-   The file permissions are designed to be similar to file permissions on
-   other familiar platforms like Linux. Currently, security is limited to
-   simple file permissions. The user that starts NameNode is treated as
-   the superuser for HDFS. Future versions of HDFS will support network
-   authentication protocols like Kerberos for user authentication and
-   encryption of data transfers. The details are discussed in the
-   Permissions Guide.
-
-* Scalability
-
-   Hadoop currently runs on clusters with thousands of nodes. The
-   {{{http://wiki.apache.org/hadoop/PoweredBy}PoweredBy}} Wiki page lists
-   some of the organizations that deploy Hadoop on large clusters.
-   HDFS has one NameNode for each cluster. Currently the total memory
-   available on NameNode is the primary scalability limitation.
-   On very large clusters, increasing average size of files stored in
-   HDFS helps with increasing cluster size without increasing memory
-   requirements on NameNode. The default configuration may not suite
-   very large clusters. The {{{http://wiki.apache.org/hadoop/FAQ}FAQ}}
-   Wiki page lists suggested configuration improvements for large Hadoop 
clusters.
-
-* Related Documentation
-
-   This user guide is a good starting point for working with HDFS. While
-   the user guide continues to improve, there is a large wealth of
-   documentation about Hadoop and HDFS. The following list is a starting
-   point for further exploration:
-
-     * {{{http://hadoop.apache.org}Hadoop Site}}: The home page for
-       the Apache Hadoop site.
-
-     * {{{http://wiki.apache.org/hadoop/FrontPage}Hadoop Wiki}}:
-       The home page (FrontPage) for the Hadoop Wiki. Unlike
-       the released documentation, which is part of Hadoop source tree,
-       Hadoop Wiki is regularly edited by Hadoop Community.
-
-     * {{{http://wiki.apache.org/hadoop/FAQ}FAQ}}: The FAQ Wiki page.
-
-     * {{{../../api/index.html}Hadoop JavaDoc API}}.
-
-     * Hadoop User Mailing List: user[at]hadoop.apache.org.
-
-     * Explore {{{./hdfs-default.xml}hdfs-default.xml}}. It includes
-       brief description of most of the configuration variables available.
-
-     * {{{./HDFSCommands.html}HDFS Commands Guide}}: HDFS commands usage.

http://git-wip-us.apache.org/repos/asf/hadoop/blob/5c0073e7/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/Hftp.apt.vm
----------------------------------------------------------------------
diff --git a/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/Hftp.apt.vm 
b/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/Hftp.apt.vm
deleted file mode 100644
index bab36bf..0000000
--- a/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/Hftp.apt.vm
+++ /dev/null
@@ -1,58 +0,0 @@
-~~ Licensed under the Apache License, Version 2.0 (the "License");
-~~ you may not use this file except in compliance with the License.
-~~ You may obtain a copy of the License at
-~~
-~~   http://www.apache.org/licenses/LICENSE-2.0
-~~
-~~ Unless required by applicable law or agreed to in writing, software
-~~ distributed under the License is distributed on an "AS IS" BASIS,
-~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-~~ See the License for the specific language governing permissions and
-~~ limitations under the License. See accompanying LICENSE file.
-
-  ---
-  HFTP Guide
-  ---
-  ---
-  ${maven.build.timestamp}
-
-HFTP Guide
-
-%{toc|section=1|fromDepth=0}
-
-* Introduction
-
-   HFTP is a Hadoop filesystem implementation that lets you read data from
-   a remote Hadoop HDFS cluster. The reads are done via HTTP, and data is
-   sourced from DataNodes. HFTP is a read-only filesystem, and will throw
-   exceptions if you try to use it to write data or modify the filesystem
-   state.
-
-   HFTP is primarily useful if you have multiple HDFS clusters with
-   different versions and you need to move data from one to another. HFTP
-   is wire-compatible even between different versions of HDFS. For
-   example, you can do things like: <<<hadoop distcp -i 
hftp://sourceFS:50070/src hdfs://destFS:50070/dest>>>.
-   Note that HFTP is read-only so the destination must be an HDFS filesystem.
-   (Also, in this example, the distcp should be run using the configuraton of
-   the new filesystem.)
-
-   An extension, HSFTP, uses HTTPS by default. This means that data will
-   be encrypted in transit.
-
-* Implementation
-
-   The code for HFTP lives in the Java class
-   <<<org.apache.hadoop.hdfs.HftpFileSystem>>>. Likewise, HSFTP is implemented
-   in <<<org.apache.hadoop.hdfs.HsftpFileSystem>>>.
-
-* Configuration Options
-
-*-----------------------:-----------------------------------+
-| <<Name>>              | <<Description>>                   |
-*-----------------------:-----------------------------------+
-| <<<dfs.hftp.https.port>>> | the HTTPS port on the remote cluster. If not set,
-|                       |   HFTP will fall back on <<<dfs.https.port>>>.
-*-----------------------:-----------------------------------+
-| <<<hdfs.service.host_ip:port>>> | Specifies the service name (for the 
security
-|                       |  subsystem) associated with the HFTP filesystem 
running at ip:port.
-*-----------------------:-----------------------------------+

http://git-wip-us.apache.org/repos/asf/hadoop/blob/5c0073e7/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/LibHdfs.apt.vm
----------------------------------------------------------------------
diff --git a/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/LibHdfs.apt.vm 
b/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/LibHdfs.apt.vm
deleted file mode 100644
index 23ff678..0000000
--- a/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/LibHdfs.apt.vm
+++ /dev/null
@@ -1,101 +0,0 @@
-~~ Licensed under the Apache License, Version 2.0 (the "License");
-~~ you may not use this file except in compliance with the License.
-~~ You may obtain a copy of the License at
-~~
-~~   http://www.apache.org/licenses/LICENSE-2.0
-~~
-~~ Unless required by applicable law or agreed to in writing, software
-~~ distributed under the License is distributed on an "AS IS" BASIS,
-~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-~~ See the License for the specific language governing permissions and
-~~ limitations under the License. See accompanying LICENSE file.
-
-  ---
-  C API libhdfs
-  ---
-  ---
-  ${maven.build.timestamp}
-
-C API libhdfs
-
-%{toc|section=1|fromDepth=0} 
-
-* Overview
-
-   libhdfs is a JNI based C API for Hadoop's Distributed File System
-   (HDFS). It provides C APIs to a subset of the HDFS APIs to manipulate
-   HDFS files and the filesystem. libhdfs is part of the Hadoop
-   distribution and comes pre-compiled in
-   <<<${HADOOP_HDFS_HOME}/lib/native/libhdfs.so>>> .  libhdfs is compatible 
with
-   Windows and can be built on Windows by running <<<mvn compile>>> within the
-   <<<hadoop-hdfs-project/hadoop-hdfs>>> directory of the source tree.
-
-* The APIs
-
-   The libhdfs APIs are a subset of the
-   {{{../../api/org/apache/hadoop/fs/FileSystem.html}Hadoop FileSystem APIs}}.
-
-   The header file for libhdfs describes each API in detail and is
-   available in <<<${HADOOP_HDFS_HOME}/include/hdfs.h>>>.
-
-* A Sample Program
-
-----
-    \#include "hdfs.h"
-
-    int main(int argc, char **argv) {
-
-        hdfsFS fs = hdfsConnect("default", 0);
-        const char* writePath = "/tmp/testfile.txt";
-        hdfsFile writeFile = hdfsOpenFile(fs, writePath, O_WRONLY|O_CREAT, 0, 
0, 0);
-        if(!writeFile) {
-              fprintf(stderr, "Failed to open %s for writing!\n", writePath);
-              exit(-1);
-        }
-        char* buffer = "Hello, World!";
-        tSize num_written_bytes = hdfsWrite(fs, writeFile, (void*)buffer, 
strlen(buffer)+1);
-        if (hdfsFlush(fs, writeFile)) {
-               fprintf(stderr, "Failed to 'flush' %s\n", writePath);
-              exit(-1);
-        }
-        hdfsCloseFile(fs, writeFile);
-    }
-----
-
-* How To Link With The Library
-
-   See the CMake file for <<<test_libhdfs_ops.c>>> in the libhdfs source
-   directory (<<<hadoop-hdfs-project/hadoop-hdfs/src/CMakeLists.txt>>>) or
-   something like:
-   <<<gcc above_sample.c -I${HADOOP_HDFS_HOME}/include 
-L${HADOOP_HDFS_HOME}/lib/native -lhdfs -o above_sample>>>
-
-* Common Problems
-
-   The most common problem is the <<<CLASSPATH>>> is not set properly when
-   calling a program that uses libhdfs. Make sure you set it to all the
-   Hadoop jars needed to run Hadoop itself as well as the right configuration
-   directory containing <<<hdfs-site.xml>>>.  It is not valid to use wildcard
-   syntax for specifying multiple jars.  It may be useful to run
-   <<<hadoop classpath --glob>>> or <<<hadoop classpath --jar <path>>>> to
-   generate the correct classpath for your deployment.  See
-   {{{../hadoop-common/CommandsManual.html#classpath}Hadoop Commands 
Reference}}
-   for more information on this command.
-
-* Thread Safe
-
-   libdhfs is thread safe.
-
-     * Concurrency and Hadoop FS "handles"
-
-       The Hadoop FS implementation includes a FS handle cache which
-       caches based on the URI of the namenode along with the user
-       connecting. So, all calls to <<<hdfsConnect>>> will return the same
-       handle but calls to <<<hdfsConnectAsUser>>> with different users will
-       return different handles. But, since HDFS client handles are
-       completely thread safe, this has no bearing on concurrency.
-
-     * Concurrency and libhdfs/JNI
-
-       The libhdfs calls to JNI should always be creating thread local
-       storage, so (in theory), libhdfs should be as thread safe as the
-       underlying calls to the Hadoop FS.

http://git-wip-us.apache.org/repos/asf/hadoop/blob/5c0073e7/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/SLGUserGuide.apt.vm
----------------------------------------------------------------------
diff --git a/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/SLGUserGuide.apt.vm 
b/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/SLGUserGuide.apt.vm
deleted file mode 100644
index 07666e3..0000000
--- a/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/SLGUserGuide.apt.vm
+++ /dev/null
@@ -1,195 +0,0 @@
-~~ Licensed under the Apache License, Version 2.0 (the "License");
-~~ you may not use this file except in compliance with the License.
-~~ You may obtain a copy of the License at
-~~
-~~   http://www.apache.org/licenses/LICENSE-2.0
-~~
-~~ Unless required by applicable law or agreed to in writing, software
-~~ distributed under the License is distributed on an "AS IS" BASIS,
-~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-~~ See the License for the specific language governing permissions and
-~~ limitations under the License. See accompanying LICENSE file.
-
-  ---
-  Synthetic Load Generator Guide
-  ---
-  ---
-  ${maven.build.timestamp}
-
-Synthetic Load Generator Guide
-
-%{toc|section=1|fromDepth=0}
-
-* Overview
-
-   The synthetic load generator (SLG) is a tool for testing NameNode
-   behavior under different client loads. The user can generate different
-   mixes of read, write, and list requests by specifying the probabilities
-   of read and write. The user controls the intensity of the load by
-   adjusting parameters for the number of worker threads and the delay
-   between operations. While load generators are running, the user can
-   profile and monitor the running of the NameNode. When a load generator
-   exits, it prints some NameNode statistics like the average execution
-   time of each kind of operation and the NameNode throughput.
-
-* Synopsis
-
-   The synopsis of the command is:
-
-----
-    java LoadGenerator [options]
-----
-
-   Options include:
-
-     * <<<-readProbability>>> <read probability>
-
-       The probability of the read operation; default is 0.3333.
-
-     * <<<-writeProbability>>> <write probability>
-
-       The probability of the write operations; default is 0.3333.
-
-     * <<<-root>>> <test space root>
-
-       The root of the test space; default is /testLoadSpace.
-
-     * <<<-maxDelayBetweenOps>>> <maxDelayBetweenOpsInMillis>
-
-       The maximum delay between two consecutive operations in a thread;
-       default is 0 indicating no delay.
-
-     * <<<-numOfThreads>>> <numOfThreads>
-
-       The number of threads to spawn; default is 200.
-
-     * <<<-elapsedTime>>> <elapsedTimeInSecs>
-
-       The number of seconds that the program will run; A value of zero
-       indicates that the program runs forever. The default value is 0.
-
-     * <<<-startTime>>> <startTimeInMillis>
-
-       The time that all worker threads start to run. By default it is 10
-       seconds after the main program starts running.This creates a
-       barrier if more than one load generator is running.
-
-     * <<<-seed>>> <seed>
-
-       The random generator seed for repeating requests to NameNode when
-       running with a single thread; default is the current time.
-
-   After command line argument parsing, the load generator traverses the
-   test space and builds a table of all directories and another table of
-   all files in the test space. It then waits until the start time to
-   spawn the number of worker threads as specified by the user. Each
-   thread sends a stream of requests to NameNode. At each iteration, it
-   first decides if it is going to read a file, create a file, or list a
-   directory following the read and write probabilities specified by the
-   user. The listing probability is equal to 1-read probability-write
-   probability. When reading, it randomly picks a file in the test space
-   and reads the entire file. When writing, it randomly picks a directory
-   in the test space and creates a file there.
-
-   To avoid two threads with the same load generator or from two different
-   load generators creating the same file, the file name consists of the
-   current machine's host name and the thread id. The length of the file
-   follows Gaussian distribution with an average size of 2 blocks and the
-   standard deviation of 1. The new file is filled with byte 'a'. To avoid
-   the test space growing indefinitely, the file is deleted immediately
-   after the file creation completes. While listing, it randomly picks a
-   directory in the test space and lists its content.
-
-   After an operation completes, the thread pauses for a random amount of
-   time in the range of [0, maxDelayBetweenOps] if the specified maximum
-   delay is not zero. All threads are stopped when the specified elapsed
-   time is passed. Before exiting, the program prints the average
-   execution for each kind of NameNode operations, and the number of
-   requests served by the NameNode per second.
-
-* Test Space Population
-
-   The user needs to populate a test space before running a load
-   generator. The structure generator generates a random test space
-   structure and the data generator creates the files and directories of
-   the test space in Hadoop distributed file system.
-
-** Structure Generator
-
-   This tool generates a random namespace structure with the following
-   constraints:
-
-    [[1]] The number of subdirectories that a directory can have is a random
-       number in [minWidth, maxWidth].
-
-    [[2]] The maximum depth of each subdirectory is a random number
-       [2*maxDepth/3, maxDepth].
-
-    [[3]] Files are randomly placed in leaf directories. The size of each
-       file follows Gaussian distribution with an average size of 1 block
-       and a standard deviation of 1.
-
-   The generated namespace structure is described by two files in the
-   output directory. Each line of the first file contains the full name of
-   a leaf directory. Each line of the second file contains the full name
-   of a file and its size, separated by a blank.
-
-   The synopsis of the command is:
-
-----
-    java StructureGenerator [options]
-----
-
-   Options include:
-
-     * <<<-maxDepth>>> <maxDepth>
-
-       Maximum depth of the directory tree; default is 5.
-
-     * <<<-minWidth>>> <minWidth>
-
-       Minimum number of subdirectories per directories; default is 1.
-
-     * <<<-maxWidth>>> <maxWidth>
-
-       Maximum number of subdirectories per directories; default is 5.
-
-     * <<<-numOfFiles>>> <#OfFiles>
-
-       The total number of files in the test space; default is 10.
-
-     * <<<-avgFileSize>>> <avgFileSizeInBlocks>
-
-       Average size of blocks; default is 1.
-
-     * <<<-outDir>>> <outDir>
-
-       Output directory; default is the current directory.
-
-     * <<<-seed>>> <seed>
-
-       Random number generator seed; default is the current time.
-
-** Data Generator
-
-   This tool reads the directory structure and file structure from the
-   input directory and creates the namespace in Hadoop distributed file
-   system. All files are filled with byte 'a'.
-
-   The synopsis of the command is:
-
-----
-    java DataGenerator [options]
-----
-
-   Options include:
-
-     * <<<-inDir>>> <inDir>
-
-       Input directory name where directory/file structures are stored;
-       default is the current directory.
-
-     * <<<-root>>> <test space root>
-
-       The name of the root directory which the new namespace is going to
-       be placed under; default is "/testLoadSpace".

[08/11] hadoop git commit: HDFS-7668. Backport "Convert site documentation from apt to markdown" to branch-2 (Masatake Iwasaki via Colin P. McCabe)

Reply via email to