[3/6] hadoop git commit: HADOOP-11495. Backport "convert site documentation from apt to markdown" to branch-2 (Masatake Iwasaki via Colin P. McCabe)

cmccabe Tue, 24 Feb 2015 16:04:24 -0800

http://git-wip-us.apache.org/repos/asf/hadoop/blob/343cffb0/hadoop-common-project/hadoop-common/src/site/markdown/Compatibility.md
----------------------------------------------------------------------
diff --git 
a/hadoop-common-project/hadoop-common/src/site/markdown/Compatibility.md 
b/hadoop-common-project/hadoop-common/src/site/markdown/Compatibility.md
new file mode 100644
index 0000000..c058021
--- /dev/null
+++ b/hadoop-common-project/hadoop-common/src/site/markdown/Compatibility.md
@@ -0,0 +1,313 @@
+<!---
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+Apache Hadoop Compatibility
+===========================
+
+* [Apache Hadoop Compatibility](#Apache_Hadoop_Compatibility)
+    * [Purpose](#Purpose)
+    * [Compatibility types](#Compatibility_types)
+        * [Java API](#Java_API)
+            * [Use Cases](#Use_Cases)
+            * [Policy](#Policy)
+        * [Semantic compatibility](#Semantic_compatibility)
+            * [Policy](#Policy)
+        * [Wire compatibility](#Wire_compatibility)
+            * [Use Cases](#Use_Cases)
+            * [Policy](#Policy)
+        * [Java Binary compatibility for end-user applications i.e. Apache 
Hadoop 
ABI](#Java_Binary_compatibility_for_end-user_applications_i.e._Apache_Hadoop_ABI)
+            * [Use cases](#Use_cases)
+            * [Policy](#Policy)
+        * [REST APIs](#REST_APIs)
+            * [Policy](#Policy)
+        * [Metrics/JMX](#MetricsJMX)
+            * [Policy](#Policy)
+        * [File formats & Metadata](#File_formats__Metadata)
+            * [User-level file formats](#User-level_file_formats)
+                * [Policy](#Policy)
+            * [System-internal file formats](#System-internal_file_formats)
+                * [MapReduce](#MapReduce)
+                * [Policy](#Policy)
+                * [HDFS Metadata](#HDFS_Metadata)
+                * [Policy](#Policy)
+        * [Command Line Interface (CLI)](#Command_Line_Interface_CLI)
+            * [Policy](#Policy)
+        * [Web UI](#Web_UI)
+            * [Policy](#Policy)
+        * [Hadoop Configuration Files](#Hadoop_Configuration_Files)
+            * [Policy](#Policy)
+        * [Directory Structure](#Directory_Structure)
+            * [Policy](#Policy)
+        * [Java Classpath](#Java_Classpath)
+            * [Policy](#Policy)
+        * [Environment variables](#Environment_variables)
+            * [Policy](#Policy)
+        * [Build artifacts](#Build_artifacts)
+            * [Policy](#Policy)
+        * [Hardware/Software Requirements](#HardwareSoftware_Requirements)
+            * [Policies](#Policies)
+    * [References](#References)
+
+Purpose
+-------
+
+This document captures the compatibility goals of the Apache Hadoop project. 
The different types of compatibility between Hadoop releases that affects 
Hadoop developers, downstream projects, and end-users are enumerated. For each 
type of compatibility we:
+
+* describe the impact on downstream projects or end-users
+* where applicable, call out the policy adopted by the Hadoop developers when 
incompatible changes are permitted.
+
+Compatibility types
+-------------------
+
+### Java API
+
+Hadoop interfaces and classes are annotated to describe the intended audience 
and stability in order to maintain compatibility with previous releases. See 
[Hadoop Interface Classification](./InterfaceClassification.html) for details.
+
+* InterfaceAudience: captures the intended audience, possible values are 
Public (for end users and external projects), LimitedPrivate (for other Hadoop 
components, and closely related projects like YARN, MapReduce, HBase etc.), and 
Private (for intra component use).
+* InterfaceStability: describes what types of interface changes are permitted. 
Possible values are Stable, Evolving, Unstable, and Deprecated.
+
+#### Use Cases
+
+* Public-Stable API compatibility is required to ensure end-user programs and 
downstream projects continue to work without modification.
+* LimitedPrivate-Stable API compatibility is required to allow upgrade of 
individual components across minor releases.
+* Private-Stable API compatibility is required for rolling upgrades.
+
+#### Policy
+
+* Public-Stable APIs must be deprecated for at least one major release prior 
to their removal in a major release.
+* LimitedPrivate-Stable APIs can change across major releases, but not within 
a major release.
+* Private-Stable APIs can change across major releases, but not within a major 
release.
+* Classes not annotated are implicitly "Private". Class members not annotated 
inherit the annotations of the enclosing class.
+* Note: APIs generated from the proto files need to be compatible for 
rolling-upgrades. See the section on wire-compatibility for more details. The 
compatibility policies for APIs and wire-communication need to go hand-in-hand 
to address this.
+
+### Semantic compatibility
+
+Apache Hadoop strives to ensure that the behavior of APIs remains consistent 
over versions, though changes for correctness may result in changes in 
behavior. Tests and javadocs specify the API's behavior. The community is in 
the process of specifying some APIs more rigorously, and enhancing test suites 
to verify compliance with the specification, effectively creating a formal 
specification for the subset of behaviors that can be easily tested.
+
+#### Policy
+
+The behavior of API may be changed to fix incorrect behavior, such a change to 
be accompanied by updating existing buggy tests or adding tests in cases there 
were none prior to the change.
+
+### Wire compatibility
+
+Wire compatibility concerns data being transmitted over the wire between 
Hadoop processes. Hadoop uses Protocol Buffers for most RPC communication. 
Preserving compatibility requires prohibiting modification as described below. 
Non-RPC communication should be considered as well, for example using HTTP to 
transfer an HDFS image as part of snapshotting or transferring MapTask output. 
The potential communications can be categorized as follows:
+
+* Client-Server: communication between Hadoop clients and servers (e.g., the 
HDFS client to NameNode protocol, or the YARN client to ResourceManager 
protocol).
+* Client-Server (Admin): It is worth distinguishing a subset of the 
Client-Server protocols used solely by administrative commands (e.g., the 
HAAdmin protocol) as these protocols only impact administrators who can 
tolerate changes that end users (which use general Client-Server protocols) can 
not.
+* Server-Server: communication between servers (e.g., the protocol between the 
DataNode and NameNode, or NodeManager and ResourceManager)
+
+#### Use Cases
+
+* Client-Server compatibility is required to allow users to continue using the 
old clients even after upgrading the server (cluster) to a later version (or 
vice versa). For example, a Hadoop 2.1.0 client talking to a Hadoop 2.3.0 
cluster.
+* Client-Server compatibility is also required to allow users to upgrade the 
client before upgrading the server (cluster). For example, a Hadoop 2.4.0 
client talking to a Hadoop 2.3.0 cluster. This allows deployment of client-side 
bug fixes ahead of full cluster upgrades. Note that new cluster features 
invoked by new client APIs or shell commands will not be usable. YARN 
applications that attempt to use new APIs (including new fields in data 
structures) that have not yet deployed to the cluster can expect link 
exceptions.
+* Client-Server compatibility is also required to allow upgrading individual 
components without upgrading others. For example, upgrade HDFS from version 
2.1.0 to 2.2.0 without upgrading MapReduce.
+* Server-Server compatibility is required to allow mixed versions within an 
active cluster so the cluster may be upgraded without downtime in a rolling 
fashion.
+
+#### Policy
+
+* Both Client-Server and Server-Server compatibility is preserved within a 
major release. (Different policies for different categories are yet to be 
considered.)
+* Compatibility can be broken only at a major release, though breaking 
compatibility even at major releases has grave consequences and should be 
discussed in the Hadoop community.
+* Hadoop protocols are defined in .proto (ProtocolBuffers) files. 
Client-Server protocols and Server-protocol .proto files are marked as stable. 
When a .proto file is marked as stable it means that changes should be made in 
a compatible fashion as described below:
+    * The following changes are compatible and are allowed at any time:
+        * Add an optional field, with the expectation that the code deals with 
the field missing due to communication with an older version of the code.
+        * Add a new rpc/method to the service
+        * Add a new optional request to a Message
+        * Rename a field
+        * Rename a .proto file
+        * Change .proto annotations that effect code generation (e.g. name of 
java package)
+    * The following changes are incompatible but can be considered only at a 
major release
+        * Change the rpc/method name
+        * Change the rpc/method parameter type or return type
+        * Remove an rpc/method
+        * Change the service name
+        * Change the name of a Message
+        * Modify a field type in an incompatible way (as defined recursively)
+        * Change an optional field to required
+        * Add or delete a required field
+        * Delete an optional field as long as the optional field has 
reasonable defaults to allow deletions
+    * The following changes are incompatible and hence never allowed
+        * Change a field id
+        * Reuse an old field that was previously deleted.
+        * Field numbers are cheap and changing and reusing is not a good idea.
+
+### Java Binary compatibility for end-user applications i.e. Apache Hadoop ABI
+
+As Apache Hadoop revisions are upgraded end-users reasonably expect that their 
applications should continue to work without any modifications. This is 
fulfilled as a result of support API compatibility, Semantic compatibility and 
Wire compatibility.
+
+However, Apache Hadoop is a very complex, distributed system and services a 
very wide variety of use-cases. In particular, Apache Hadoop MapReduce is a 
very, very wide API; in the sense that end-users may make wide-ranging 
assumptions such as layout of the local disk when their map/reduce tasks are 
executing, environment variables for their tasks etc. In such cases, it becomes 
very hard to fully specify, and support, absolute compatibility.
+
+#### Use cases
+
+* Existing MapReduce applications, including jars of existing packaged 
end-user applications and projects such as Apache Pig, Apache Hive, Cascading 
etc. should work unmodified when pointed to an upgraded Apache Hadoop cluster 
within a major release.
+* Existing YARN applications, including jars of existing packaged end-user 
applications and projects such as Apache Tez etc. should work unmodified when 
pointed to an upgraded Apache Hadoop cluster within a major release.
+* Existing applications which transfer data in/out of HDFS, including jars of 
existing packaged end-user applications and frameworks such as Apache Flume, 
should work unmodified when pointed to an upgraded Apache Hadoop cluster within 
a major release.
+
+#### Policy
+
+* Existing MapReduce, YARN & HDFS applications and frameworks should work 
unmodified within a major release i.e. Apache Hadoop ABI is supported.
+* A very minor fraction of applications maybe affected by changes to disk 
layouts etc., the developer community will strive to minimize these changes and 
will not make them within a minor version. In more egregious cases, we will 
consider strongly reverting these breaking changes and invalidating offending 
releases if necessary.
+* In particular for MapReduce applications, the developer community will try 
our best to support provide binary compatibility across major releases e.g. 
applications using org.apache.hadoop.mapred.
+* APIs are supported compatibly across hadoop-1.x and hadoop-2.x. See 
[Compatibility for MapReduce applications between hadoop-1.x and 
hadoop-2.x](../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduce_Compatibility_Hadoop1_Hadoop2.html)
 for more details.
+
+### REST APIs
+
+REST API compatibility corresponds to both the request (URLs) and responses to 
each request (content, which may contain other URLs). Hadoop REST APIs are 
specifically meant for stable use by clients across releases, even major 
releases. The following are the exposed REST APIs:
+
+* [WebHDFS](../hadoop-hdfs/WebHDFS.html) - Stable
+* 
[ResourceManager](../../hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html)
+* [NodeManager](../../hadoop-yarn/hadoop-yarn-site/NodeManagerRest.html)
+* [MR Application 
Master](../../hadoop-yarn/hadoop-yarn-site/MapredAppMasterRest.html)
+* [History Server](../../hadoop-yarn/hadoop-yarn-site/HistoryServerRest.html)
+
+#### Policy
+
+The APIs annotated stable in the text above preserve compatibility across at 
least one major release, and maybe deprecated by a newer version of the REST 
API in a major release.
+
+### Metrics/JMX
+
+While the Metrics API compatibility is governed by Java API compatibility, the 
actual metrics exposed by Hadoop need to be compatible for users to be able to 
automate using them (scripts etc.). Adding additional metrics is compatible. 
Modifying (eg changing the unit or measurement) or removing existing metrics 
breaks compatibility. Similarly, changes to JMX MBean object names also break 
compatibility.
+
+#### Policy
+
+Metrics should preserve compatibility within the major release.
+
+### File formats & Metadata
+
+User and system level data (including metadata) is stored in files of 
different formats. Changes to the metadata or the file formats used to store 
data/metadata can lead to incompatibilities between versions.
+
+#### User-level file formats
+
+Changes to formats that end-users use to store their data can prevent them for 
accessing the data in later releases, and hence it is highly important to keep 
those file-formats compatible. One can always add a "new" format improving upon 
an existing format. Examples of these formats include har, war, 
SequenceFileFormat etc.
+
+##### Policy
+
+* Non-forward-compatible user-file format changes are restricted to major 
releases. When user-file formats change, new releases are expected to read 
existing formats, but may write data in formats incompatible with prior 
releases. Also, the community shall prefer to create a new format that programs 
must opt in to instead of making incompatible changes to existing formats.
+
+#### System-internal file formats
+
+Hadoop internal data is also stored in files and again changing these formats 
can lead to incompatibilities. While such changes are not as devastating as the 
user-level file formats, a policy on when the compatibility can be broken is 
important.
+
+##### MapReduce
+
+MapReduce uses formats like I-File to store MapReduce-specific data.
+
+##### Policy
+
+MapReduce-internal formats like IFile maintain compatibility within a major 
release. Changes to these formats can cause in-flight jobs to fail and hence we 
should ensure newer clients can fetch shuffle-data from old servers in a 
compatible manner.
+
+##### HDFS Metadata
+
+HDFS persists metadata (the image and edit logs) in a particular format. 
Incompatible changes to either the format or the metadata prevent subsequent 
releases from reading older metadata. Such incompatible changes might require 
an HDFS "upgrade" to convert the metadata to make it accessible. Some changes 
can require more than one such "upgrades".
+
+Depending on the degree of incompatibility in the changes, the following 
potential scenarios can arise:
+
+* Automatic: The image upgrades automatically, no need for an explicit 
"upgrade".
+* Direct: The image is upgradable, but might require one explicit release 
"upgrade".
+* Indirect: The image is upgradable, but might require upgrading to 
intermediate release(s) first.
+* Not upgradeable: The image is not upgradeable.
+
+##### Policy
+
+* A release upgrade must allow a cluster to roll-back to the older version and 
its older disk format. The rollback needs to restore the original data, but not 
required to restore the updated data.
+* HDFS metadata changes must be upgradeable via any of the upgrade paths - 
automatic, direct or indirect.
+* More detailed policies based on the kind of upgrade are yet to be considered.
+
+### Command Line Interface (CLI)
+
+The Hadoop command line programs may be use either directly via the system 
shell or via shell scripts. Changing the path of a command, removing or 
renaming command line options, the order of arguments, or the command return 
code and output break compatibility and may adversely affect users.
+
+#### Policy
+
+CLI commands are to be deprecated (warning when used) for one major release 
before they are removed or incompatibly modified in a subsequent major release.
+
+### Web UI
+
+Web UI, particularly the content and layout of web pages, changes could 
potentially interfere with attempts to screen scrape the web pages for 
information.
+
+#### Policy
+
+Web pages are not meant to be scraped and hence incompatible changes to them 
are allowed at any time. Users are expected to use REST APIs to get any 
information.
+
+### Hadoop Configuration Files
+
+Users use (1) Hadoop-defined properties to configure and provide hints to 
Hadoop and (2) custom properties to pass information to jobs. Hence, 
compatibility of config properties is two-fold:
+
+* Modifying key-names, units of values, and default values of Hadoop-defined 
properties.
+* Custom configuration property keys should not conflict with the namespace of 
Hadoop-defined properties. Typically, users should avoid using prefixes used by 
Hadoop: hadoop, io, ipc, fs, net, file, ftp, s3, kfs, ha, file, dfs, mapred, 
mapreduce, yarn.
+
+#### Policy
+
+* Hadoop-defined properties are to be deprecated at least for one major 
release before being removed. Modifying units for existing properties is not 
allowed.
+* The default values of Hadoop-defined properties can be changed across 
minor/major releases, but will remain the same across point releases within a 
minor release.
+* Currently, there is NO explicit policy regarding when new prefixes can be 
added/removed, and the list of prefixes to be avoided for custom configuration 
properties. However, as noted above, users should avoid using prefixes used by 
Hadoop: hadoop, io, ipc, fs, net, file, ftp, s3, kfs, ha, file, dfs, mapred, 
mapreduce, yarn.
+
+### Directory Structure
+
+Source code, artifacts (source and tests), user logs, configuration files, 
output and job history are all stored on disk either local file system or HDFS. 
Changing the directory structure of these user-accessible files break 
compatibility, even in cases where the original path is preserved via symbolic 
links (if, for example, the path is accessed by a servlet that is configured to 
not follow symbolic links).
+
+#### Policy
+
+* The layout of source code and build artifacts can change anytime, 
particularly so across major versions. Within a major version, the developers 
will attempt (no guarantees) to preserve the directory structure; however, 
individual files can be added/moved/deleted. The best way to ensure patches 
stay in sync with the code is to get them committed to the Apache source tree.
+* The directory structure of configuration files, user logs, and job history 
will be preserved across minor and point releases within a major release.
+
+### Java Classpath
+
+User applications built against Hadoop might add all Hadoop jars (including 
Hadoop's library dependencies) to the application's classpath. Adding new 
dependencies or updating the version of existing dependencies may interfere 
with those in applications' classpaths.
+
+#### Policy
+
+Currently, there is NO policy on when Hadoop's dependencies can change.
+
+### Environment variables
+
+Users and related projects often utilize the exported environment variables 
(eg HADOOP\_CONF\_DIR), therefore removing or renaming environment variables is 
an incompatible change.
+
+#### Policy
+
+Currently, there is NO policy on when the environment variables can change. 
Developers try to limit changes to major releases.
+
+### Build artifacts
+
+Hadoop uses maven for project management and changing the artifacts can affect 
existing user workflows.
+
+#### Policy
+
+* Test artifacts: The test jars generated are strictly for internal use and 
are not expected to be used outside of Hadoop, similar to APIs annotated 
@Private, @Unstable.
+* Built artifacts: The hadoop-client artifact (maven groupId:artifactId) stays 
compatible within a major release, while the other artifacts can change in 
incompatible ways.
+
+### Hardware/Software Requirements
+
+To keep up with the latest advances in hardware, operating systems, JVMs, and 
other software, new Hadoop releases or some of their features might require 
higher versions of the same. For a specific environment, upgrading Hadoop might 
require upgrading other dependent software components.
+
+#### Policies
+
+* Hardware
+    * Architecture: The community has no plans to restrict Hadoop to specific 
architectures, but can have family-specific optimizations.
+    * Minimum resources: While there are no guarantees on the minimum 
resources required by Hadoop daemons, the community attempts to not increase 
requirements within a minor release.
+* Operating Systems: The community will attempt to maintain the same OS 
requirements (OS kernel versions) within a minor release. Currently GNU/Linux 
and Microsoft Windows are the OSes officially supported by the community while 
Apache Hadoop is known to work reasonably well on other OSes such as Apple 
MacOSX, Solaris etc.
+* The JVM requirements will not change across point releases within the same 
minor release except if the JVM version under question becomes unsupported. 
Minor/major releases might require later versions of JVM for some/all of the 
supported operating systems.
+* Other software: The community tries to maintain the minimum versions of 
additional software required by Hadoop. For example, ssh, kerberos etc.
+
+References
+----------
+
+Here are some relevant JIRAs and pages related to the topic:
+
+* The evolution of this document - 
[HADOOP-9517](https://issues.apache.org/jira/browse/HADOOP-9517)
+* Binary compatibility for MapReduce end-user applications between hadoop-1.x 
and hadoop-2.x - [MapReduce Compatibility between hadoop-1.x and 
hadoop-2.x](../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduce_Compatibility_Hadoop1_Hadoop2.html)
+* Annotations for interfaces as per interface classification schedule - 
[HADOOP-7391](https://issues.apache.org/jira/browse/HADOOP-7391) [Hadoop 
Interface Classification](./InterfaceClassification.html)
+* Compatibility for Hadoop 1.x releases - 
[HADOOP-5071](https://issues.apache.org/jira/browse/HADOOP-5071)
+* The [Hadoop Roadmap](http://wiki.apache.org/hadoop/Roadmap) page that 
captures other release policies
+
+


http://git-wip-us.apache.org/repos/asf/hadoop/blob/343cffb0/hadoop-common-project/hadoop-common/src/site/markdown/DeprecatedProperties.md
----------------------------------------------------------------------
diff --git 
a/hadoop-common-project/hadoop-common/src/site/markdown/DeprecatedProperties.md 
b/hadoop-common-project/hadoop-common/src/site/markdown/DeprecatedProperties.md
new file mode 100644
index 0000000..dae9928
--- /dev/null
+++ 
b/hadoop-common-project/hadoop-common/src/site/markdown/DeprecatedProperties.md
@@ -0,0 +1,288 @@
+<!---
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+Deprecated Properties
+=====================
+
+The following table lists the configuration property names that are deprecated 
in this version of Hadoop, and their replacements.
+
+| **Deprecated property name** | **New property name** |
+|:---- |:---- |
+| create.empty.dir.if.nonexist | mapreduce.jobcontrol.createdir.ifnotexist |
+| dfs.access.time.precision | dfs.namenode.accesstime.precision |
+| dfs.backup.address | dfs.namenode.backup.address |
+| dfs.backup.http.address | dfs.namenode.backup.http-address |
+| dfs.balance.bandwidthPerSec | dfs.datanode.balance.bandwidthPerSec |
+| dfs.block.size | dfs.blocksize |
+| dfs.data.dir | dfs.datanode.data.dir |
+| dfs.datanode.max.xcievers | dfs.datanode.max.transfer.threads |
+| dfs.df.interval | fs.df.interval |
+| dfs.federation.nameservice.id | dfs.nameservice.id |
+| dfs.federation.nameservices | dfs.nameservices |
+| dfs.http.address | dfs.namenode.http-address |
+| dfs.https.address | dfs.namenode.https-address |
+| dfs.https.client.keystore.resource | dfs.client.https.keystore.resource |
+| dfs.https.need.client.auth | dfs.client.https.need-auth |
+| dfs.max.objects | dfs.namenode.max.objects |
+| dfs.max-repl-streams | dfs.namenode.replication.max-streams |
+| dfs.name.dir | dfs.namenode.name.dir |
+| dfs.name.dir.restore | dfs.namenode.name.dir.restore |
+| dfs.name.edits.dir | dfs.namenode.edits.dir |
+| dfs.permissions | dfs.permissions.enabled |
+| dfs.permissions.supergroup | dfs.permissions.superusergroup |
+| dfs.read.prefetch.size | dfs.client.read.prefetch.size |
+| dfs.replication.considerLoad | dfs.namenode.replication.considerLoad |
+| dfs.replication.interval | dfs.namenode.replication.interval |
+| dfs.replication.min | dfs.namenode.replication.min |
+| dfs.replication.pending.timeout.sec | 
dfs.namenode.replication.pending.timeout-sec |
+| dfs.safemode.extension | dfs.namenode.safemode.extension |
+| dfs.safemode.threshold.pct | dfs.namenode.safemode.threshold-pct |
+| dfs.secondary.http.address | dfs.namenode.secondary.http-address |
+| dfs.socket.timeout | dfs.client.socket-timeout |
+| dfs.umaskmode | fs.permissions.umask-mode |
+| dfs.write.packet.size | dfs.client-write-packet-size |
+| fs.checkpoint.dir | dfs.namenode.checkpoint.dir |
+| fs.checkpoint.edits.dir | dfs.namenode.checkpoint.edits.dir |
+| fs.checkpoint.period | dfs.namenode.checkpoint.period |
+| fs.default.name | fs.defaultFS |
+| hadoop.configured.node.mapping | net.topology.configured.node.mapping |
+| hadoop.job.history.location | mapreduce.jobtracker.jobhistory.location |
+| hadoop.native.lib | io.native.lib.available |
+| hadoop.net.static.resolutions | mapreduce.tasktracker.net.static.resolutions 
|
+| hadoop.pipes.command-file.keep | mapreduce.pipes.commandfile.preserve |
+| hadoop.pipes.executable.interpretor | mapreduce.pipes.executable.interpretor 
|
+| hadoop.pipes.executable | mapreduce.pipes.executable |
+| hadoop.pipes.java.mapper | mapreduce.pipes.isjavamapper |
+| hadoop.pipes.java.recordreader | mapreduce.pipes.isjavarecordreader |
+| hadoop.pipes.java.recordwriter | mapreduce.pipes.isjavarecordwriter |
+| hadoop.pipes.java.reducer | mapreduce.pipes.isjavareducer |
+| hadoop.pipes.partitioner | mapreduce.pipes.partitioner |
+| heartbeat.recheck.interval | dfs.namenode.heartbeat.recheck-interval |
+| io.bytes.per.checksum | dfs.bytes-per-checksum |
+| io.sort.factor | mapreduce.task.io.sort.factor |
+| io.sort.mb | mapreduce.task.io.sort.mb |
+| io.sort.spill.percent | mapreduce.map.sort.spill.percent |
+| jobclient.completion.poll.interval | 
mapreduce.client.completion.pollinterval |
+| jobclient.output.filter | mapreduce.client.output.filter |
+| jobclient.progress.monitor.poll.interval | 
mapreduce.client.progressmonitor.pollinterval |
+| job.end.notification.url | mapreduce.job.end-notification.url |
+| job.end.retry.attempts | mapreduce.job.end-notification.retry.attempts |
+| job.end.retry.interval | mapreduce.job.end-notification.retry.interval |
+| job.local.dir | mapreduce.job.local.dir |
+| keep.failed.task.files | mapreduce.task.files.preserve.failedtasks |
+| keep.task.files.pattern | mapreduce.task.files.preserve.filepattern |
+| key.value.separator.in.input.line | 
mapreduce.input.keyvaluelinerecordreader.key.value.separator |
+| local.cache.size | mapreduce.tasktracker.cache.local.size |
+| map.input.file | mapreduce.map.input.file |
+| map.input.length | mapreduce.map.input.length |
+| map.input.start | mapreduce.map.input.start |
+| map.output.key.field.separator | mapreduce.map.output.key.field.separator |
+| map.output.key.value.fields.spec | 
mapreduce.fieldsel.map.output.key.value.fields.spec |
+| mapred.acls.enabled | mapreduce.cluster.acls.enabled |
+| mapred.binary.partitioner.left.offset | 
mapreduce.partition.binarypartitioner.left.offset |
+| mapred.binary.partitioner.right.offset | 
mapreduce.partition.binarypartitioner.right.offset |
+| mapred.cache.archives | mapreduce.job.cache.archives |
+| mapred.cache.archives.timestamps | mapreduce.job.cache.archives.timestamps |
+| mapred.cache.files | mapreduce.job.cache.files |
+| mapred.cache.files.timestamps | mapreduce.job.cache.files.timestamps |
+| mapred.cache.localArchives | mapreduce.job.cache.local.archives |
+| mapred.cache.localFiles | mapreduce.job.cache.local.files |
+| mapred.child.tmp | mapreduce.task.tmp.dir |
+| mapred.cluster.average.blacklist.threshold | 
mapreduce.jobtracker.blacklist.average.threshold |
+| mapred.cluster.map.memory.mb | mapreduce.cluster.mapmemory.mb |
+| mapred.cluster.max.map.memory.mb | mapreduce.jobtracker.maxmapmemory.mb |
+| mapred.cluster.max.reduce.memory.mb | 
mapreduce.jobtracker.maxreducememory.mb |
+| mapred.cluster.reduce.memory.mb | mapreduce.cluster.reducememory.mb |
+| mapred.committer.job.setup.cleanup.needed | 
mapreduce.job.committer.setup.cleanup.needed |
+| mapred.compress.map.output | mapreduce.map.output.compress |
+| mapred.data.field.separator | mapreduce.fieldsel.data.field.separator |
+| mapred.debug.out.lines | mapreduce.task.debugout.lines |
+| mapred.healthChecker.interval | mapreduce.tasktracker.healthchecker.interval 
|
+| mapred.healthChecker.script.args | 
mapreduce.tasktracker.healthchecker.script.args |
+| mapred.healthChecker.script.path | 
mapreduce.tasktracker.healthchecker.script.path |
+| mapred.healthChecker.script.timeout | 
mapreduce.tasktracker.healthchecker.script.timeout |
+| mapred.heartbeats.in.second | mapreduce.jobtracker.heartbeats.in.second |
+| mapred.hosts.exclude | mapreduce.jobtracker.hosts.exclude.filename |
+| mapred.hosts | mapreduce.jobtracker.hosts.filename |
+| mapred.inmem.merge.threshold | mapreduce.reduce.merge.inmem.threshold |
+| mapred.input.dir.formats | mapreduce.input.multipleinputs.dir.formats |
+| mapred.input.dir.mappers | mapreduce.input.multipleinputs.dir.mappers |
+| mapred.input.dir | mapreduce.input.fileinputformat.inputdir |
+| mapred.input.pathFilter.class | mapreduce.input.pathFilter.class |
+| mapred.jar | mapreduce.job.jar |
+| mapred.job.classpath.archives | mapreduce.job.classpath.archives |
+| mapred.job.classpath.files | mapreduce.job.classpath.files |
+| mapred.job.id | mapreduce.job.id |
+| mapred.jobinit.threads | mapreduce.jobtracker.jobinit.threads |
+| mapred.job.map.memory.mb | mapreduce.map.memory.mb |
+| mapred.job.name | mapreduce.job.name |
+| mapred.job.priority | mapreduce.job.priority |
+| mapred.job.queue.name | mapreduce.job.queuename |
+| mapred.job.reduce.input.buffer.percent | 
mapreduce.reduce.input.buffer.percent |
+| mapred.job.reduce.markreset.buffer.percent | 
mapreduce.reduce.markreset.buffer.percent |
+| mapred.job.reduce.memory.mb | mapreduce.reduce.memory.mb |
+| mapred.job.reduce.total.mem.bytes | mapreduce.reduce.memory.totalbytes |
+| mapred.job.reuse.jvm.num.tasks | mapreduce.job.jvm.numtasks |
+| mapred.job.shuffle.input.buffer.percent | 
mapreduce.reduce.shuffle.input.buffer.percent |
+| mapred.job.shuffle.merge.percent | mapreduce.reduce.shuffle.merge.percent |
+| mapred.job.tracker.handler.count | mapreduce.jobtracker.handler.count |
+| mapred.job.tracker.history.completed.location | 
mapreduce.jobtracker.jobhistory.completed.location |
+| mapred.job.tracker.http.address | mapreduce.jobtracker.http.address |
+| mapred.jobtracker.instrumentation | mapreduce.jobtracker.instrumentation |
+| mapred.jobtracker.job.history.block.size | 
mapreduce.jobtracker.jobhistory.block.size |
+| mapred.job.tracker.jobhistory.lru.cache.size | 
mapreduce.jobtracker.jobhistory.lru.cache.size |
+| mapred.job.tracker | mapreduce.jobtracker.address |
+| mapred.jobtracker.maxtasks.per.job | mapreduce.jobtracker.maxtasks.perjob |
+| mapred.job.tracker.persist.jobstatus.active | 
mapreduce.jobtracker.persist.jobstatus.active |
+| mapred.job.tracker.persist.jobstatus.dir | 
mapreduce.jobtracker.persist.jobstatus.dir |
+| mapred.job.tracker.persist.jobstatus.hours | 
mapreduce.jobtracker.persist.jobstatus.hours |
+| mapred.jobtracker.restart.recover | mapreduce.jobtracker.restart.recover |
+| mapred.job.tracker.retiredjobs.cache.size | 
mapreduce.jobtracker.retiredjobs.cache.size |
+| mapred.job.tracker.retire.jobs | mapreduce.jobtracker.retirejobs |
+| mapred.jobtracker.taskalloc.capacitypad | 
mapreduce.jobtracker.taskscheduler.taskalloc.capacitypad |
+| mapred.jobtracker.taskScheduler | mapreduce.jobtracker.taskscheduler |
+| mapred.jobtracker.taskScheduler.maxRunningTasksPerJob | 
mapreduce.jobtracker.taskscheduler.maxrunningtasks.perjob |
+| mapred.join.expr | mapreduce.join.expr |
+| mapred.join.keycomparator | mapreduce.join.keycomparator |
+| mapred.lazy.output.format | mapreduce.output.lazyoutputformat.outputformat |
+| mapred.line.input.format.linespermap | 
mapreduce.input.lineinputformat.linespermap |
+| mapred.linerecordreader.maxlength | 
mapreduce.input.linerecordreader.line.maxlength |
+| mapred.local.dir | mapreduce.cluster.local.dir |
+| mapred.local.dir.minspacekill | mapreduce.tasktracker.local.dir.minspacekill 
|
+| mapred.local.dir.minspacestart | 
mapreduce.tasktracker.local.dir.minspacestart |
+| mapred.map.child.env | mapreduce.map.env |
+| mapred.map.child.java.opts | mapreduce.map.java.opts |
+| mapred.map.child.log.level | mapreduce.map.log.level |
+| mapred.map.max.attempts | mapreduce.map.maxattempts |
+| mapred.map.output.compression.codec | mapreduce.map.output.compress.codec |
+| mapred.mapoutput.key.class | mapreduce.map.output.key.class |
+| mapred.mapoutput.value.class | mapreduce.map.output.value.class |
+| mapred.mapper.regex.group | mapreduce.mapper.regexmapper..group |
+| mapred.mapper.regex | mapreduce.mapper.regex |
+| mapred.map.task.debug.script | mapreduce.map.debug.script |
+| mapred.map.tasks | mapreduce.job.maps |
+| mapred.map.tasks.speculative.execution | mapreduce.map.speculative |
+| mapred.max.map.failures.percent | mapreduce.map.failures.maxpercent |
+| mapred.max.reduce.failures.percent | mapreduce.reduce.failures.maxpercent |
+| mapred.max.split.size | mapreduce.input.fileinputformat.split.maxsize |
+| mapred.max.tracker.blacklists | 
mapreduce.jobtracker.tasktracker.maxblacklists |
+| mapred.max.tracker.failures | mapreduce.job.maxtaskfailures.per.tracker |
+| mapred.merge.recordsBeforeProgress | mapreduce.task.merge.progress.records |
+| mapred.min.split.size | mapreduce.input.fileinputformat.split.minsize |
+| mapred.min.split.size.per.node | 
mapreduce.input.fileinputformat.split.minsize.per.node |
+| mapred.min.split.size.per.rack | 
mapreduce.input.fileinputformat.split.minsize.per.rack |
+| mapred.output.compression.codec | 
mapreduce.output.fileoutputformat.compress.codec |
+| mapred.output.compression.type | 
mapreduce.output.fileoutputformat.compress.type |
+| mapred.output.compress | mapreduce.output.fileoutputformat.compress |
+| mapred.output.dir | mapreduce.output.fileoutputformat.outputdir |
+| mapred.output.key.class | mapreduce.job.output.key.class |
+| mapred.output.key.comparator.class | 
mapreduce.job.output.key.comparator.class |
+| mapred.output.value.class | mapreduce.job.output.value.class |
+| mapred.output.value.groupfn.class | 
mapreduce.job.output.group.comparator.class |
+| mapred.permissions.supergroup | mapreduce.cluster.permissions.supergroup |
+| mapred.pipes.user.inputformat | mapreduce.pipes.inputformat |
+| mapred.reduce.child.env | mapreduce.reduce.env |
+| mapred.reduce.child.java.opts | mapreduce.reduce.java.opts |
+| mapred.reduce.child.log.level | mapreduce.reduce.log.level |
+| mapred.reduce.max.attempts | mapreduce.reduce.maxattempts |
+| mapred.reduce.parallel.copies | mapreduce.reduce.shuffle.parallelcopies |
+| mapred.reduce.slowstart.completed.maps | 
mapreduce.job.reduce.slowstart.completedmaps |
+| mapred.reduce.task.debug.script | mapreduce.reduce.debug.script |
+| mapred.reduce.tasks | mapreduce.job.reduces |
+| mapred.reduce.tasks.speculative.execution | mapreduce.reduce.speculative |
+| mapred.seqbinary.output.key.class | 
mapreduce.output.seqbinaryoutputformat.key.class |
+| mapred.seqbinary.output.value.class | 
mapreduce.output.seqbinaryoutputformat.value.class |
+| mapred.shuffle.connect.timeout | mapreduce.reduce.shuffle.connect.timeout |
+| mapred.shuffle.read.timeout | mapreduce.reduce.shuffle.read.timeout |
+| mapred.skip.attempts.to.start.skipping | mapreduce.task.skip.start.attempts |
+| mapred.skip.map.auto.incr.proc.count | 
mapreduce.map.skip.proc-count.auto-incr |
+| mapred.skip.map.max.skip.records | mapreduce.map.skip.maxrecords |
+| mapred.skip.on | mapreduce.job.skiprecords |
+| mapred.skip.out.dir | mapreduce.job.skip.outdir |
+| mapred.skip.reduce.auto.incr.proc.count | 
mapreduce.reduce.skip.proc-count.auto-incr |
+| mapred.skip.reduce.max.skip.groups | mapreduce.reduce.skip.maxgroups |
+| mapred.speculative.execution.slowNodeThreshold | 
mapreduce.job.speculative.slownodethreshold |
+| mapred.speculative.execution.slowTaskThreshold | 
mapreduce.job.speculative.slowtaskthreshold |
+| mapred.speculative.execution.speculativeCap | 
mapreduce.job.speculative.speculativecap |
+| mapred.submit.replication | mapreduce.client.submit.file.replication |
+| mapred.system.dir | mapreduce.jobtracker.system.dir |
+| mapred.task.cache.levels | mapreduce.jobtracker.taskcache.levels |
+| mapred.task.id | mapreduce.task.attempt.id |
+| mapred.task.is.map | mapreduce.task.ismap |
+| mapred.task.partition | mapreduce.task.partition |
+| mapred.task.profile | mapreduce.task.profile |
+| mapred.task.profile.maps | mapreduce.task.profile.maps |
+| mapred.task.profile.params | mapreduce.task.profile.params |
+| mapred.task.profile.reduces | mapreduce.task.profile.reduces |
+| mapred.task.timeout | mapreduce.task.timeout |
+| mapred.tasktracker.dns.interface | mapreduce.tasktracker.dns.interface |
+| mapred.tasktracker.dns.nameserver | mapreduce.tasktracker.dns.nameserver |
+| mapred.tasktracker.events.batchsize | mapreduce.tasktracker.events.batchsize 
|
+| mapred.tasktracker.expiry.interval | 
mapreduce.jobtracker.expire.trackers.interval |
+| mapred.task.tracker.http.address | mapreduce.tasktracker.http.address |
+| mapred.tasktracker.indexcache.mb | mapreduce.tasktracker.indexcache.mb |
+| mapred.tasktracker.instrumentation | mapreduce.tasktracker.instrumentation |
+| mapred.tasktracker.map.tasks.maximum | 
mapreduce.tasktracker.map.tasks.maximum |
+| mapred.tasktracker.memory\_calculator\_plugin | 
mapreduce.tasktracker.resourcecalculatorplugin |
+| mapred.tasktracker.memorycalculatorplugin | 
mapreduce.tasktracker.resourcecalculatorplugin |
+| mapred.tasktracker.reduce.tasks.maximum | 
mapreduce.tasktracker.reduce.tasks.maximum |
+| mapred.task.tracker.report.address | mapreduce.tasktracker.report.address |
+| mapred.task.tracker.task-controller | mapreduce.tasktracker.taskcontroller |
+| mapred.tasktracker.taskmemorymanager.monitoring-interval | 
mapreduce.tasktracker.taskmemorymanager.monitoringinterval |
+| mapred.tasktracker.tasks.sleeptime-before-sigkill | 
mapreduce.tasktracker.tasks.sleeptimebeforesigkill |
+| mapred.temp.dir | mapreduce.cluster.temp.dir |
+| mapred.text.key.comparator.options | 
mapreduce.partition.keycomparator.options |
+| mapred.text.key.partitioner.options | 
mapreduce.partition.keypartitioner.options |
+| mapred.textoutputformat.separator | 
mapreduce.output.textoutputformat.separator |
+| mapred.tip.id | mapreduce.task.id |
+| mapreduce.combine.class | mapreduce.job.combine.class |
+| mapreduce.inputformat.class | mapreduce.job.inputformat.class |
+| mapreduce.job.counters.limit | mapreduce.job.counters.max |
+| mapreduce.jobtracker.permissions.supergroup | 
mapreduce.cluster.permissions.supergroup |
+| mapreduce.map.class | mapreduce.job.map.class |
+| mapreduce.outputformat.class | mapreduce.job.outputformat.class |
+| mapreduce.partitioner.class | mapreduce.job.partitioner.class |
+| mapreduce.reduce.class | mapreduce.job.reduce.class |
+| mapred.used.genericoptionsparser | 
mapreduce.client.genericoptionsparser.used |
+| mapred.userlog.limit.kb | mapreduce.task.userlog.limit.kb |
+| mapred.userlog.retain.hours | mapreduce.job.userlog.retain.hours |
+| mapred.working.dir | mapreduce.job.working.dir |
+| mapred.work.output.dir | mapreduce.task.output.dir |
+| min.num.spills.for.combine | mapreduce.map.combine.minspills |
+| reduce.output.key.value.fields.spec | 
mapreduce.fieldsel.reduce.output.key.value.fields.spec |
+| security.job.submission.protocol.acl | security.job.client.protocol.acl |
+| security.task.umbilical.protocol.acl | security.job.task.protocol.acl |
+| sequencefile.filter.class | mapreduce.input.sequencefileinputfilter.class |
+| sequencefile.filter.frequency | 
mapreduce.input.sequencefileinputfilter.frequency |
+| sequencefile.filter.regex | mapreduce.input.sequencefileinputfilter.regex |
+| session.id | dfs.metrics.session-id |
+| slave.host.name | dfs.datanode.hostname |
+| slave.host.name | mapreduce.tasktracker.host.name |
+| tasktracker.contention.tracking | mapreduce.tasktracker.contention.tracking |
+| tasktracker.http.threads | mapreduce.tasktracker.http.threads |
+| topology.node.switch.mapping.impl | net.topology.node.switch.mapping.impl |
+| topology.script.file.name | net.topology.script.file.name |
+| topology.script.number.args | net.topology.script.number.args |
+| user.name | mapreduce.job.user.name |
+| webinterface.private.actions | mapreduce.jobtracker.webinterface.trusted |
+| yarn.app.mapreduce.yarn.app.mapreduce.client-am.ipc.max-retries-on-timeouts 
| yarn.app.mapreduce.client-am.ipc.max-retries-on-timeouts |
+
+The following table lists additional changes to some configuration properties:
+
+| **Deprecated property name** | **New property name** |
+|:---- |:---- |
+| mapred.create.symlink | NONE - symlinking is always on |
+| mapreduce.job.cache.symlink.create | NONE - symlinking is always on |
+
+

http://git-wip-us.apache.org/repos/asf/hadoop/blob/343cffb0/hadoop-common-project/hadoop-common/src/site/markdown/FileSystemShell.md
----------------------------------------------------------------------
diff --git 
a/hadoop-common-project/hadoop-common/src/site/markdown/FileSystemShell.md 
b/hadoop-common-project/hadoop-common/src/site/markdown/FileSystemShell.md
new file mode 100644
index 0000000..37f644d
--- /dev/null
+++ b/hadoop-common-project/hadoop-common/src/site/markdown/FileSystemShell.md
@@ -0,0 +1,710 @@
+<!---
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+* [Overview](#Overview)
+    * [appendToFile](#appendToFile)
+    * [cat](#cat)
+    * [checksum](#checksum)
+    * [chgrp](#chgrp)
+    * [chmod](#chmod)
+    * [chown](#chown)
+    * [copyFromLocal](#copyFromLocal)
+    * [copyToLocal](#copyToLocal)
+    * [count](#count)
+    * [cp](#cp)
+    * [createSnapshot](#createSnapshot)
+    * [deleteSnapshot](#deleteSnapshot)
+    * [df](#df)
+    * [du](#du)
+    * [dus](#dus)
+    * [expunge](#expunge)
+    * [find](#find)
+    * [get](#get)
+    * [getfacl](#getfacl)
+    * [getfattr](#getfattr)
+    * [getmerge](#getmerge)
+    * [help](#help)
+    * [ls](#ls)
+    * [lsr](#lsr)
+    * [mkdir](#mkdir)
+    * [moveFromLocal](#moveFromLocal)
+    * [moveToLocal](#moveToLocal)
+    * [mv](#mv)
+    * [put](#put)
+    * [renameSnapshot](#renameSnapshot)
+    * [rm](#rm)
+    * [rmdir](#rmdir)
+    * [rmr](#rmr)
+    * [setfacl](#setfacl)
+    * [setfattr](#setfattr)
+    * [setrep](#setrep)
+    * [stat](#stat)
+    * [tail](#tail)
+    * [test](#test)
+    * [text](#text)
+    * [touchz](#touchz)
+    * [truncate](#truncate)
+    * [usage](#usage)
+
+Overview
+========
+
+The File System (FS) shell includes various shell-like commands that directly 
interact with the Hadoop Distributed File System (HDFS) as well as other file 
systems that Hadoop supports, such as Local FS, HFTP FS, S3 FS, and others. The 
FS shell is invoked by:
+
+    bin/hadoop fs <args>
+
+All FS shell commands take path URIs as arguments. The URI format is 
`scheme://authority/path`. For HDFS the scheme is `hdfs`, and for the Local FS 
the scheme is `file`. The scheme and authority are optional. If not specified, 
the default scheme specified in the configuration is used. An HDFS file or 
directory such as /parent/child can be specified as 
`hdfs://namenodehost/parent/child` or simply as `/parent/child` (given that 
your configuration is set to point to `hdfs://namenodehost`).
+
+Most of the commands in FS shell behave like corresponding Unix commands. 
Differences are described with each of the commands. Error information is sent 
to stderr and the output is sent to stdout.
+
+If HDFS is being used, `hdfs dfs` is a synonym.
+
+See the [Commands Manual](./CommandsManual.html) for generic shell options.
+
+appendToFile
+------------
+
+Usage: `hadoop fs -appendToFile <localsrc> ... <dst> `
+
+Append single src, or multiple srcs from local file system to the destination 
file system. Also reads input from stdin and appends to destination file system.
+
+* `hadoop fs -appendToFile localfile /user/hadoop/hadoopfile`
+* `hadoop fs -appendToFile localfile1 localfile2 /user/hadoop/hadoopfile`
+* `hadoop fs -appendToFile localfile hdfs://nn.example.com/hadoop/hadoopfile`
+* `hadoop fs -appendToFile - hdfs://nn.example.com/hadoop/hadoopfile` Reads 
the input from stdin.
+
+Exit Code:
+
+Returns 0 on success and 1 on error.
+
+cat
+---
+
+Usage: `hadoop fs -cat URI [URI ...]`
+
+Copies source paths to stdout.
+
+Example:
+
+* `hadoop fs -cat hdfs://nn1.example.com/file1 hdfs://nn2.example.com/file2`
+* `hadoop fs -cat file:///file3 /user/hadoop/file4`
+
+Exit Code:
+
+Returns 0 on success and -1 on error.
+
+checksum
+--------
+
+Usage: `hadoop fs -checksum URI`
+
+Returns the checksum information of a file.
+
+Example:
+
+* `hadoop fs -checksum hdfs://nn1.example.com/file1`
+* `hadoop fs -checksum file:///etc/hosts`
+
+chgrp
+-----
+
+Usage: `hadoop fs -chgrp [-R] GROUP URI [URI ...]`
+
+Change group association of files. The user must be the owner of files, or 
else a super-user. Additional information is in the [Permissions 
Guide](../hadoop-hdfs/HdfsPermissionsGuide.html).
+
+Options
+
+* The -R option will make the change recursively through the directory 
structure.
+
+chmod
+-----
+
+Usage: `hadoop fs -chmod [-R] <MODE[,MODE]... | OCTALMODE> URI [URI ...]`
+
+Change the permissions of files. With -R, make the change recursively through 
the directory structure. The user must be the owner of the file, or else a 
super-user. Additional information is in the [Permissions 
Guide](../hadoop-hdfs/HdfsPermissionsGuide.html).
+
+Options
+
+* The -R option will make the change recursively through the directory 
structure.
+
+chown
+-----
+
+Usage: `hadoop fs -chown [-R] [OWNER][:[GROUP]] URI [URI ]`
+
+Change the owner of files. The user must be a super-user. Additional 
information is in the [Permissions 
Guide](../hadoop-hdfs/HdfsPermissionsGuide.html).
+
+Options
+
+* The -R option will make the change recursively through the directory 
structure.
+
+copyFromLocal
+-------------
+
+Usage: `hadoop fs -copyFromLocal <localsrc> URI`
+
+Similar to put command, except that the source is restricted to a local file 
reference.
+
+Options:
+
+* The -f option will overwrite the destination if it already exists.
+
+copyToLocal
+-----------
+
+Usage: `hadoop fs -copyToLocal [-ignorecrc] [-crc] URI <localdst> `
+
+Similar to get command, except that the destination is restricted to a local 
file reference.
+
+count
+-----
+
+Usage: `hadoop fs -count [-q] [-h] [-v] <paths> `
+
+Count the number of directories, files and bytes under the paths that match 
the specified file pattern. The output columns with -count are: DIR\_COUNT, 
FILE\_COUNT, CONTENT\_SIZE, PATHNAME
+
+The output columns with -count -q are: QUOTA, REMAINING\_QUATA, SPACE\_QUOTA, 
REMAINING\_SPACE\_QUOTA, DIR\_COUNT, FILE\_COUNT, CONTENT\_SIZE, PATHNAME
+
+The -h option shows sizes in human readable format.
+
+The -v option displays a header line.
+
+Example:
+
+* `hadoop fs -count hdfs://nn1.example.com/file1 hdfs://nn2.example.com/file2`
+* `hadoop fs -count -q hdfs://nn1.example.com/file1`
+* `hadoop fs -count -q -h hdfs://nn1.example.com/file1`
+* `hdfs dfs -count -q -h -v hdfs://nn1.example.com/file1`
+
+Exit Code:
+
+Returns 0 on success and -1 on error.
+
+cp
+----
+
+Usage: `hadoop fs -cp [-f] [-p | -p[topax]] URI [URI ...] <dest> `
+
+Copy files from source to destination. This command allows multiple sources as 
well in which case the destination must be a directory.
+
+'raw.\*' namespace extended attributes are preserved if (1) the source and 
destination filesystems support them (HDFS only), and (2) all source and 
destination pathnames are in the /.reserved/raw hierarchy. Determination of 
whether raw.\* namespace xattrs are preserved is independent of the -p 
(preserve) flag.
+
+Options:
+
+* The -f option will overwrite the destination if it already exists.
+* The -p option will preserve file attributes [topx] (timestamps, ownership, 
permission, ACL, XAttr). If -p is specified with no *arg*, then preserves 
timestamps, ownership, permission. If -pa is specified, then preserves 
permission also because ACL is a super-set of permission. Determination of 
whether raw namespace extended attributes are preserved is independent of the 
-p flag.
+
+Example:
+
+* `hadoop fs -cp /user/hadoop/file1 /user/hadoop/file2`
+* `hadoop fs -cp /user/hadoop/file1 /user/hadoop/file2 /user/hadoop/dir`
+
+Exit Code:
+
+Returns 0 on success and -1 on error.
+
+createSnapshot
+--------------
+
+See [HDFS Snapshots Guide](../hadoop-hdfs/HdfsSnapshots.html).
+
+deleteSnapshot
+--------------
+
+See [HDFS Snapshots Guide](../hadoop-hdfs/HdfsSnapshots.html).
+
+df
+----
+
+Usage: `hadoop fs -df [-h] URI [URI ...]`
+
+Displays free space.
+
+Options:
+
+* The -h option will format file sizes in a "human-readable" fashion (e.g 
64.0m instead of 67108864)
+
+Example:
+
+* `hadoop dfs -df /user/hadoop/dir1`
+
+du
+----
+
+Usage: `hadoop fs -du [-s] [-h] URI [URI ...]`
+
+Displays sizes of files and directories contained in the given directory or 
the length of a file in case its just a file.
+
+Options:
+
+* The -s option will result in an aggregate summary of file lengths being 
displayed, rather than the individual files.
+* The -h option will format file sizes in a "human-readable" fashion (e.g 
64.0m instead of 67108864)
+
+Example:
+
+* `hadoop fs -du /user/hadoop/dir1 /user/hadoop/file1 
hdfs://nn.example.com/user/hadoop/dir1`
+
+Exit Code: Returns 0 on success and -1 on error.
+
+dus
+---
+
+Usage: `hadoop fs -dus <args> `
+
+Displays a summary of file lengths.
+
+**Note:** This command is deprecated. Instead use `hadoop fs -du -s`.
+
+expunge
+-------
+
+Usage: `hadoop fs -expunge`
+
+Empty the Trash. Refer to the [HDFS Architecture 
Guide](../hadoop-hdfs/HdfsDesign.html) for more information on the Trash 
feature.
+
+find
+----
+
+Usage: `hadoop fs -find <path> ... <expression> ... `
+
+Finds all files that match the specified expression and applies selected 
actions to them. If no *path* is specified then defaults to the current working 
directory. If no expression is specified then defaults to -print.
+
+The following primary expressions are recognised:
+
+*   -name pattern<br />-iname pattern
+
+    Evaluates as true if the basename of the file matches the pattern using 
standard file system globbing. If -iname is used then the match is case 
insensitive.
+
+*   -print<br />-print0Always
+
+    evaluates to true. Causes the current pathname to be written to standard 
output. If the -print0 expression is used then an ASCII NULL character is 
appended.
+
+The following operators are recognised:
+
+* expression -a expression<br />expression -and expression<br />expression 
expression
+
+    Logical AND operator for joining two expressions. Returns true if both 
child expressions return true. Implied by the juxtaposition of two expressions 
and so does not need to be explicitly specified. The second expression will not 
be applied if the first fails.
+
+Example:
+
+`hadoop fs -find / -name test -print`
+
+Exit Code:
+
+Returns 0 on success and -1 on error.
+
+get
+---
+
+Usage: `hadoop fs -get [-ignorecrc] [-crc] <src> <localdst> `
+
+Copy files to the local file system. Files that fail the CRC check may be 
copied with the -ignorecrc option. Files and CRCs may be copied using the -crc 
option.
+
+Example:
+
+* `hadoop fs -get /user/hadoop/file localfile`
+* `hadoop fs -get hdfs://nn.example.com/user/hadoop/file localfile`
+
+Exit Code:
+
+Returns 0 on success and -1 on error.
+
+getfacl
+-------
+
+Usage: `hadoop fs -getfacl [-R] <path> `
+
+Displays the Access Control Lists (ACLs) of files and directories. If a 
directory has a default ACL, then getfacl also displays the default ACL.
+
+Options:
+
+* -R: List the ACLs of all files and directories recursively.
+* *path*: File or directory to list.
+
+Examples:
+
+* `hadoop fs -getfacl /file`
+* `hadoop fs -getfacl -R /dir`
+
+Exit Code:
+
+Returns 0 on success and non-zero on error.
+
+getfattr
+--------
+
+Usage: `hadoop fs -getfattr [-R] -n name | -d [-e en] <path> `
+
+Displays the extended attribute names and values (if any) for a file or 
directory.
+
+Options:
+
+* -R: Recursively list the attributes for all files and directories.
+* -n name: Dump the named extended attribute value.
+* -d: Dump all extended attribute values associated with pathname.
+* -e *encoding*: Encode values after retrieving them. Valid encodings are 
"text", "hex", and "base64". Values encoded as text strings are enclosed in 
double quotes ("), and values encoded as hexadecimal and base64 are prefixed 
with 0x and 0s, respectively.
+* *path*: The file or directory.
+
+Examples:
+
+* `hadoop fs -getfattr -d /file`
+* `hadoop fs -getfattr -R -n user.myAttr /dir`
+
+Exit Code:
+
+Returns 0 on success and non-zero on error.
+
+getmerge
+--------
+
+Usage: `hadoop fs -getmerge <src> <localdst> [addnl]`
+
+Takes a source directory and a destination file as input and concatenates 
files in src into the destination local file. Optionally addnl can be set to 
enable adding a newline character at the end of each file.
+
+help
+----
+
+Usage: `hadoop fs -help`
+
+Return usage output.
+
+ls
+----
+
+Usage: `hadoop fs -ls [-d] [-h] [-R] [-t] [-S] [-r] [-u] <args> `
+
+Options:
+
+* -d: Directories are listed as plain files.
+* -h: Format file sizes in a human-readable fashion (eg 64.0m instead of 
67108864).
+* -R: Recursively list subdirectories encountered.
+* -t: Sort output by modification time (most recent first).
+* -S: Sort output by file size.
+* -r: Reverse the sort order.
+* -u: Use access time rather than modification time for display and sorting.  
+
+For a file ls returns stat on the file with the following format:
+
+    permissions number_of_replicas userid groupid filesize modification_date 
modification_time filename
+
+For a directory it returns list of its direct children as in Unix. A directory 
is listed as:
+
+    permissions userid groupid modification_date modification_time dirname
+
+Files within a directory are order by filename by default.
+
+Example:
+
+* `hadoop fs -ls /user/hadoop/file1`
+
+Exit Code:
+
+Returns 0 on success and -1 on error.
+
+lsr
+---
+
+Usage: `hadoop fs -lsr <args> `
+
+Recursive version of ls.
+
+**Note:** This command is deprecated. Instead use `hadoop fs -ls -R`
+
+mkdir
+-----
+
+Usage: `hadoop fs -mkdir [-p] <paths> `
+
+Takes path uri's as argument and creates directories.
+
+Options:
+
+* The -p option behavior is much like Unix mkdir -p, creating parent 
directories along the path.
+
+Example:
+
+* `hadoop fs -mkdir /user/hadoop/dir1 /user/hadoop/dir2`
+* `hadoop fs -mkdir hdfs://nn1.example.com/user/hadoop/dir 
hdfs://nn2.example.com/user/hadoop/dir`
+
+Exit Code:
+
+Returns 0 on success and -1 on error.
+
+moveFromLocal
+-------------
+
+Usage: `hadoop fs -moveFromLocal <localsrc> <dst> `
+
+Similar to put command, except that the source localsrc is deleted after it's 
copied.
+
+moveToLocal
+-----------
+
+Usage: `hadoop fs -moveToLocal [-crc] <src> <dst> `
+
+Displays a "Not implemented yet" message.
+
+mv
+----
+
+Usage: `hadoop fs -mv URI [URI ...] <dest> `
+
+Moves files from source to destination. This command allows multiple sources 
as well in which case the destination needs to be a directory. Moving files 
across file systems is not permitted.
+
+Example:
+
+* `hadoop fs -mv /user/hadoop/file1 /user/hadoop/file2`
+* `hadoop fs -mv hdfs://nn.example.com/file1 hdfs://nn.example.com/file2 
hdfs://nn.example.com/file3 hdfs://nn.example.com/dir1`
+
+Exit Code:
+
+Returns 0 on success and -1 on error.
+
+put
+---
+
+Usage: `hadoop fs -put <localsrc> ... <dst> `
+
+Copy single src, or multiple srcs from local file system to the destination 
file system. Also reads input from stdin and writes to destination file system.
+
+* `hadoop fs -put localfile /user/hadoop/hadoopfile`
+* `hadoop fs -put localfile1 localfile2 /user/hadoop/hadoopdir`
+* `hadoop fs -put localfile hdfs://nn.example.com/hadoop/hadoopfile`
+* `hadoop fs -put - hdfs://nn.example.com/hadoop/hadoopfile` Reads the input 
from stdin.
+
+Exit Code:
+
+Returns 0 on success and -1 on error.
+
+renameSnapshot
+--------------
+
+See [HDFS Snapshots Guide](../hadoop-hdfs/HdfsSnapshots.html).
+
+rm
+----
+
+Usage: `hadoop fs -rm [-f] [-r |-R] [-skipTrash] URI [URI ...]`
+
+Delete files specified as args.
+
+Options:
+
+* The -f option will not display a diagnostic message or modify the exit 
status to reflect an error if the file does not exist.
+* The -R option deletes the directory and any content under it recursively.
+* The -r option is equivalent to -R.
+* The -skipTrash option will bypass trash, if enabled, and delete the 
specified file(s) immediately. This can be useful when it is necessary to 
delete files from an over-quota directory.
+
+Example:
+
+* `hadoop fs -rm hdfs://nn.example.com/file /user/hadoop/emptydir`
+
+Exit Code:
+
+Returns 0 on success and -1 on error.
+
+rmdir
+-----
+
+Usage: `hadoop fs -rmdir [--ignore-fail-on-non-empty] URI [URI ...]`
+
+Delete a directory.
+
+Options:
+
+* `--ignore-fail-on-non-empty`: When using wildcards, do not fail if a 
directory still contains files.
+
+Example:
+
+* `hadoop fs -rmdir /user/hadoop/emptydir`
+
+rmr
+---
+
+Usage: `hadoop fs -rmr [-skipTrash] URI [URI ...]`
+
+Recursive version of delete.
+
+**Note:** This command is deprecated. Instead use `hadoop fs -rm -r`
+
+setfacl
+-------
+
+Usage: `hadoop fs -setfacl [-R] [-b |-k -m |-x <acl_spec> <path>] |[--set 
<acl_spec> <path>] `
+
+Sets Access Control Lists (ACLs) of files and directories.
+
+Options:
+
+* -b: Remove all but the base ACL entries. The entries for user, group and 
others are retained for compatibility with permission bits.
+* -k: Remove the default ACL.
+* -R: Apply operations to all files and directories recursively.
+* -m: Modify ACL. New entries are added to the ACL, and existing entries are 
retained.
+* -x: Remove specified ACL entries. Other ACL entries are retained.
+* ``--set``: Fully replace the ACL, discarding all existing entries. The 
*acl\_spec* must include entries for user, group, and others for compatibility 
with permission bits.
+* *acl\_spec*: Comma separated list of ACL entries.
+* *path*: File or directory to modify.
+
+Examples:
+
+* `hadoop fs -setfacl -m user:hadoop:rw- /file`
+* `hadoop fs -setfacl -x user:hadoop /file`
+* `hadoop fs -setfacl -b /file`
+* `hadoop fs -setfacl -k /dir`
+* `hadoop fs -setfacl --set user::rw-,user:hadoop:rw-,group::r--,other::r-- 
/file`
+* `hadoop fs -setfacl -R -m user:hadoop:r-x /dir`
+* `hadoop fs -setfacl -m default:user:hadoop:r-x /dir`
+
+Exit Code:
+
+Returns 0 on success and non-zero on error.
+
+setfattr
+--------
+
+Usage: `hadoop fs -setfattr -n name [-v value] | -x name <path> `
+
+Sets an extended attribute name and value for a file or directory.
+
+Options:
+
+* -b: Remove all but the base ACL entries. The entries for user, group and 
others are retained for compatibility with permission bits.
+* -n name: The extended attribute name.
+* -v value: The extended attribute value. There are three different encoding 
methods for the value. If the argument is enclosed in double quotes, then the 
value is the string inside the quotes. If the argument is prefixed with 0x or 
0X, then it is taken as a hexadecimal number. If the argument begins with 0s or 
0S, then it is taken as a base64 encoding.
+* -x name: Remove the extended attribute.
+* *path*: The file or directory.
+
+Examples:
+
+* `hadoop fs -setfattr -n user.myAttr -v myValue /file`
+* `hadoop fs -setfattr -n user.noValue /file`
+* `hadoop fs -setfattr -x user.myAttr /file`
+
+Exit Code:
+
+Returns 0 on success and non-zero on error.
+
+setrep
+------
+
+Usage: `hadoop fs -setrep [-R] [-w] <numReplicas> <path> `
+
+Changes the replication factor of a file. If *path* is a directory then the 
command recursively changes the replication factor of all files under the 
directory tree rooted at *path*.
+
+Options:
+
+* The -w flag requests that the command wait for the replication to complete. 
This can potentially take a very long time.
+* The -R flag is accepted for backwards compatibility. It has no effect.
+
+Example:
+
+* `hadoop fs -setrep -w 3 /user/hadoop/dir1`
+
+Exit Code:
+
+Returns 0 on success and -1 on error.
+
+stat
+----
+
+Usage: `hadoop fs -stat [format] <path> ...`
+
+Print statistics about the file/directory at \<path\> in the specified format. 
Format accepts filesize in blocks (%b), type (%F), group name of owner (%g), 
name (%n), block size (%o), replication (%r), user name of owner(%u), and 
modification date (%y, %Y). %y shows UTC date as "yyyy-MM-dd HH:mm:ss" and %Y 
shows milliseconds since January 1, 1970 UTC. If the format is not specified, 
%y is used by default.
+
+Example:
+
+* `hadoop fs -stat "%F %u:%g %b %y %n" /file`
+
+Exit Code: Returns 0 on success and -1 on error.
+
+tail
+----
+
+Usage: `hadoop fs -tail [-f] URI`
+
+Displays last kilobyte of the file to stdout.
+
+Options:
+
+* The -f option will output appended data as the file grows, as in Unix.
+
+Example:
+
+* `hadoop fs -tail pathname`
+
+Exit Code: Returns 0 on success and -1 on error.
+
+test
+----
+
+Usage: `hadoop fs -test -[defsz] URI`
+
+Options:
+
+* -d: f the path is a directory, return 0.
+* -e: if the path exists, return 0.
+* -f: if the path is a file, return 0.
+* -s: if the path is not empty, return 0.
+* -z: if the file is zero length, return 0.
+
+Example:
+
+* `hadoop fs -test -e filename`
+
+text
+----
+
+Usage: `hadoop fs -text <src> `
+
+Takes a source file and outputs the file in text format. The allowed formats 
are zip and TextRecordInputStream.
+
+touchz
+------
+
+Usage: `hadoop fs -touchz URI [URI ...]`
+
+Create a file of zero length.
+
+Example:
+
+* `hadoop fs -touchz pathname`
+
+Exit Code: Returns 0 on success and -1 on error.
+
+truncate
+--------
+
+Usage: `hadoop fs -truncate [-w] <length> <paths>`
+
+Truncate all files that match the specified file pattern to the
+specified length.
+
+Options:
+
+* The `-w` flag requests that the command waits for block recovery
+  to complete, if necessary. Without -w flag the file may remain
+  unclosed for some time while the recovery is in progress.
+  During this time file cannot be reopened for append.
+
+Example:
+
+* `hadoop fs -truncate 55 /user/hadoop/file1 /user/hadoop/file2`
+* `hadoop fs -truncate -w 127 hdfs://nn1.example.com/user/hadoop/file1`
+
+usage
+-----
+
+Usage: `hadoop fs -usage command`
+
+Return the help for an individual command.

http://git-wip-us.apache.org/repos/asf/hadoop/blob/343cffb0/hadoop-common-project/hadoop-common/src/site/markdown/HttpAuthentication.md
----------------------------------------------------------------------
diff --git 
a/hadoop-common-project/hadoop-common/src/site/markdown/HttpAuthentication.md 
b/hadoop-common-project/hadoop-common/src/site/markdown/HttpAuthentication.md
new file mode 100644
index 0000000..e0a2693
--- /dev/null
+++ 
b/hadoop-common-project/hadoop-common/src/site/markdown/HttpAuthentication.md
@@ -0,0 +1,58 @@
+<!---
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+Authentication for Hadoop HTTP web-consoles
+===========================================
+
+* [Authentication for Hadoop HTTP 
web-consoles](#Authentication_for_Hadoop_HTTP_web-consoles)
+    * [Introduction](#Introduction)
+    * [Configuration](#Configuration)
+
+Introduction
+------------
+
+This document describes how to configure Hadoop HTTP web-consoles to require 
user authentication.
+
+By default Hadoop HTTP web-consoles (JobTracker, NameNode, TaskTrackers and 
DataNodes) allow access without any form of authentication.
+
+Similarly to Hadoop RPC, Hadoop HTTP web-consoles can be configured to require 
Kerberos authentication using HTTP SPNEGO protocol (supported by browsers like 
Firefox and Internet Explorer).
+
+In addition, Hadoop HTTP web-consoles support the equivalent of Hadoop's 
Pseudo/Simple authentication. If this option is enabled, user must specify 
their user name in the first browser interaction using the user.name query 
string parameter. For example: 
`http://localhost:50030/jobtracker.jsp?user.name=babu`.
+
+If a custom authentication mechanism is required for the HTTP web-consoles, it 
is possible to implement a plugin to support the alternate authentication 
mechanism (refer to Hadoop hadoop-auth for details on writing an 
`AuthenticatorHandler`).
+
+The next section describes how to configure Hadoop HTTP web-consoles to 
require user authentication.
+
+Configuration
+-------------
+
+The following properties should be in the `core-site.xml` of all the nodes in 
the cluster.
+
+`hadoop.http.filter.initializers`: add to this property the 
`org.apache.hadoop.security.AuthenticationFilterInitializer` initializer class.
+
+`hadoop.http.authentication.type`: Defines authentication used for the HTTP 
web-consoles. The supported values are: `simple` | `kerberos` | 
`#AUTHENTICATION_HANDLER_CLASSNAME#`. The dfeault value is `simple`.
+
+`hadoop.http.authentication.token.validity`: Indicates how long (in seconds) 
an authentication token is valid before it has to be renewed. The default value 
is `36000`.
+
+`hadoop.http.authentication.signature.secret.file`: The signature secret file 
for signing the authentication tokens. The same secret should be used for all 
nodes in the cluster, JobTracker, NameNode, DataNode and TastTracker. The 
default value is `$user.home/hadoop-http-auth-signature-secret`. IMPORTANT: 
This file should be readable only by the Unix user running the daemons.
+
+`hadoop.http.authentication.cookie.domain`: The domain to use for the HTTP 
cookie that stores the authentication token. In order to authentiation to work 
correctly across all nodes in the cluster the domain must be correctly set. 
There is no default value, the HTTP cookie will not have a domain working only 
with the hostname issuing the HTTP cookie.
+
+IMPORTANT: when using IP addresses, browsers ignore cookies with domain 
settings. For this setting to work properly all nodes in the cluster must be 
configured to generate URLs with `hostname.domain` names on it.
+
+`hadoop.http.authentication.simple.anonymous.allowed`: Indicates if anonymous 
requests are allowed when using 'simple' authentication. The default value is 
`true`
+
+`hadoop.http.authentication.kerberos.principal`: Indicates the Kerberos 
principal to be used for HTTP endpoint when using 'kerberos' authentication. 
The principal short name must be `HTTP` per Kerberos HTTP SPNEGO specification. 
The default value is `HTTP/_HOST@$LOCALHOST`, where `_HOST` -if present- is 
replaced with bind address of the HTTP server.
+
+`hadoop.http.authentication.kerberos.keytab`: Location of the keytab file with 
the credentials for the Kerberos principal used for the HTTP endpoint. The 
default value is `$user.home/hadoop.keytab`.i

http://git-wip-us.apache.org/repos/asf/hadoop/blob/343cffb0/hadoop-common-project/hadoop-common/src/site/markdown/InterfaceClassification.md
----------------------------------------------------------------------
diff --git 
a/hadoop-common-project/hadoop-common/src/site/markdown/InterfaceClassification.md
 
b/hadoop-common-project/hadoop-common/src/site/markdown/InterfaceClassification.md
new file mode 100644
index 0000000..0392610
--- /dev/null
+++ 
b/hadoop-common-project/hadoop-common/src/site/markdown/InterfaceClassification.md
@@ -0,0 +1,105 @@
+<!---
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+Hadoop Interface Taxonomy: Audience and Stability Classification
+================================================================
+
+* [Hadoop Interface Taxonomy: Audience and Stability 
Classification](#Hadoop_Interface_Taxonomy:_Audience_and_Stability_Classification)
+    * [Motivation](#Motivation)
+    * [Interface Classification](#Interface_Classification)
+        * [Audience](#Audience)
+        * [Stability](#Stability)
+    * [How are the Classifications 
Recorded?](#How_are_the_Classifications_Recorded)
+    * [FAQ](#FAQ)
+
+Motivation
+----------
+
+The interface taxonomy classification provided here is for guidance to 
developers and users of interfaces. The classification guides a developer to 
declare the targeted audience or users of an interface and also its stability.
+
+* Benefits to the user of an interface: Knows which interfaces to use or not 
use and their stability.
+* Benefits to the developer: to prevent accidental changes of interfaces and 
hence accidental impact on users or other components or system. This is 
particularly useful in large systems with many developers who may not all have 
a shared state/history of the project.
+
+Interface Classification
+------------------------
+
+Hadoop adopts the following interface classification, this classification was 
derived from the [OpenSolaris 
taxonomy](http://www.opensolaris.org/os/community/arc/policies/interface-taxonomy/#Advice)
 and, to some extent, from taxonomy used inside Yahoo. Interfaces have two main 
attributes: Audience and Stability
+
+### Audience
+
+Audience denotes the potential consumers of the interface. While many 
interfaces are internal/private to the implementation, other are 
public/external interfaces are meant for wider consumption by applications 
and/or clients. For example, in posix, libc is an external or public interface, 
while large parts of the kernel are internal or private interfaces. Also, some 
interfaces are targeted towards other specific subsystems.
+
+Identifying the audience of an interface helps define the impact of breaking 
it. For instance, it might be okay to break the compatibility of an interface 
whose audience is a small number of specific subsystems. On the other hand, it 
is probably not okay to break a protocol interfaces that millions of Internet 
users depend on.
+
+Hadoop uses the following kinds of audience in order of increasing/wider 
visibility:
+
+* Private:
+    * The interface is for internal use within the project (such as HDFS or 
MapReduce) and should not be used by applications or by other projects. It is 
subject to change at anytime without notice. Most interfaces of a project are 
Private (also referred to as project-private).
+* Limited-Private:
+    * The interface is used by a specified set of projects or systems 
(typically closely related projects). Other projects or systems should not use 
the interface. Changes to the interface will be communicated/ negotiated with 
the specified projects. For example, in the Hadoop project, some interfaces are 
LimitedPrivate{HDFS, MapReduce} in that they are private to the HDFS and 
MapReduce projects.
+* Public
+    * The interface is for general use by any application.
+
+Hadoop doesn't have a Company-Private classification, which is meant for APIs 
which are intended to be used by other projects within the company, since it 
doesn't apply to opensource projects. Also, certain APIs are annotated as 
@VisibleForTesting (from com.google.common .annotations.VisibleForTesting) - 
these are meant to be used strictly for unit tests and should be treated as 
"Private" APIs.
+
+### Stability
+
+Stability denotes how stable an interface is, as in when incompatible changes 
to the interface are allowed. Hadoop APIs have the following levels of 
stability.
+
+* Stable
+    * Can evolve while retaining compatibility for minor release boundaries; 
in other words, incompatible changes to APIs marked Stable are allowed only at 
major releases (i.e. at m.0).
+* Evolving
+    * Evolving, but incompatible changes are allowed at minor release (i.e. m 
.x)
+* Unstable
+    * Incompatible changes to Unstable APIs are allowed any time. This usually 
makes sense for only private interfaces.
+    * However one may call this out for a supposedly public interface to 
highlight that it should not be used as an interface; for public interfaces, 
labeling it as Not-an-interface is probably more appropriate than "Unstable".
+        * Examples of publicly visible interfaces that are unstable (i.e. 
not-an-interface): GUI, CLIs whose output format will change
+* Deprecated
+    * APIs that could potentially removed in the future and should not be used.
+
+How are the Classifications Recorded?
+-------------------------------------
+
+How will the classification be recorded for Hadoop APIs?
+
+* Each interface or class will have the audience and stability recorded using 
annotations in org.apache.hadoop.classification package.
+* The javadoc generated by the maven target javadoc:javadoc lists only the 
public API.
+* One can derive the audience of java classes and java interfaces by the 
audience of the package in which they are contained. Hence it is useful to 
declare the audience of each java package as public or private (along with the 
private audience variations).
+
+FAQ
+---
+
+* Why arenât the java scopes (private, package private and public) good 
enough?
+    * Javaâs scoping is not very complete. One is often forced to make a 
class public in order for other internal components to use it. It does not have 
friends or sub-package-private like C++.
+* But I can easily access a private implementation interface if it is Java 
public. Where is the protection and control?
+    * The purpose of this is not providing absolute access control. Its 
purpose is to communicate to users and developers. One can access private 
implementation functions in libc; however if they change the internal 
implementation details, your application will break and you will have little 
sympathy from the folks who are supplying libc. If you use a non-public 
interface you understand the risks.
+* Why bother declaring the stability of a private interface? Arenât private 
interfaces always unstable?
+    * Private interfaces are not always unstable. In the cases where they are 
stable they capture internal properties of the system and can communicate these 
properties to its internal users and to developers of the interface.
+        * e.g. In HDFS, NN-DN protocol is private but stable and can help 
implement rolling upgrades. It communicates that this interface should not be 
changed in incompatible ways even though it is private.
+        * e.g. In HDFS, FSImage stability can help provide more flexible roll 
backs.
+* What is the harm in applications using a private interface that is stable? 
How is it different than a public stable interface?
+    * While a private interface marked as stable is targeted to change only at 
major releases, it may break at other times if the providers of that interface 
are willing to changes the internal users of that interface. Further, a public 
stable interface is less likely to break even at major releases (even though it 
is allowed to break compatibility) because the impact of the change is larger. 
If you use a private interface (regardless of its stability) you run the risk 
of incompatibility.
+* Why bother with Limited-private? Isnât it giving special treatment to some 
projects? That is not fair.
+    * First, most interfaces should be public or private; actually let us 
state it even stronger: make it private unless you really want to expose it to 
public for general use.
+    * Limited-private is for interfaces that are not intended for general use. 
They are exposed to related projects that need special hooks. Such a 
classification has a cost to both the supplier and consumer of the limited 
interface. Both will have to work together if ever there is a need to break the 
interface in the future; for example the supplier and the consumers will have 
to work together to get coordinated releases of their respective projects. This 
should not be taken lightly â if you can get away with private then do so; if 
the interface is really for general use for all applications then do so. But 
remember that making an interface public has huge responsibility. Sometimes 
Limited-private is just right.
+    * A good example of a limited-private interface is BlockLocations, This is 
fairly low-level interface that we are willing to expose to MR and perhaps 
HBase. We are likely to change it down the road and at that time we will have 
get a coordinated effort with the MR team to release matching releases. While 
MR and HDFS are always released in sync today, they may change down the road.
+    * If you have a limited-private interface with many projects listed then 
you are fooling yourself. It is practically public.
+    * It might be worth declaring a special audience classification called 
Hadoop-Private for the Hadoop family.
+* Lets treat all private interfaces as Hadoop-private. What is the harm in 
projects in the Hadoop family have access to private classes?
+    * Do we want MR accessing class files that are implementation details 
inside HDFS. There used to be many such layer violations in the code that we 
have been cleaning up over the last few years. We donât want such layer 
violations to creep back in by no separating between the major components like 
HDFS and MR.
+* Aren't all public interfaces stable?
+    * One may mark a public interface as evolving in its early days. Here one 
is promising to make an effort to make compatible changes but may need to break 
it at minor releases.
+    * One example of a public interface that is unstable is where one is 
providing an implementation of a standards-body based interface that is still 
under development. For example, many companies, in an attampt to be first to 
market, have provided implementations of a new NFS protocol even when the 
protocol was not fully completed by IETF. The implementor cannot evolve the 
interface in a fashion that causes least distruption because the stability is 
controlled by the standards body. Hence it is appropriate to label the 
interface as unstable.
+
+

[3/6] hadoop git commit: HADOOP-11495. Backport "convert site documentation from apt to markdown" to branch-2 (Masatake Iwasaki via Colin P. McCabe)

Reply via email to