HADOOP-14742. Document multi-URI replication Inode for ViewFS. Contributed by 
Gera Shegalov

Project: http://git-wip-us.apache.org/repos/asf/hadoop/repo
Commit: http://git-wip-us.apache.org/repos/asf/hadoop/commit/ddb67ca7
Tree: http://git-wip-us.apache.org/repos/asf/hadoop/tree/ddb67ca7
Diff: http://git-wip-us.apache.org/repos/asf/hadoop/diff/ddb67ca7

Branch: refs/heads/HDFS-12996
Commit: ddb67ca707de896cd0ba5cda3c0d1a2d9edca968
Parents: cceb68f
Author: Chris Douglas <cdoug...@apache.org>
Authored: Mon Mar 12 13:42:38 2018 -0700
Committer: Chris Douglas <cdoug...@apache.org>
Committed: Mon Mar 12 13:43:27 2018 -0700

 .../hadoop-hdfs/src/site/markdown/ViewFs.md     | 139 +++++++++++++++++++
 1 file changed, 139 insertions(+)

diff --git a/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/ViewFs.md 
index 1008583..f851ef6 100644
--- a/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/ViewFs.md
+++ b/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/ViewFs.md
@@ -180,6 +180,145 @@ Recall that one cannot rename files or directories across 
namenodes or clusters
 This will NOT work in the new world if `/user` and `/data` are actually stored 
on different namenodes within a cluster.
+Multi-Filesystem I/0 with Nfly Mount Points
+HDFS and other distributed filesystems provide data resilience via some sort of
+redundancy such as block replication or more sophisticated distributed 
+However, modern setups may be comprised of multiple Hadoop clusters, enterprise
+filers, hosted on and off premise. Nfly mount points make it possible for a
+single logical file to be synchronously replicated by multiple filesystems.
+It's designed for a relatively small files up to a gigabyte. In general it's a
+function of a single core/single network link performance since the logic
+resides in a single client JVM using ViewFs such as FsShell or a
+MapReduce task.
+### Basic Configuration
+Consider the following example to understand the basic configuration of Nfly.
+Suppose we want to keep the directory `ads` replicated on three filesystems
+represented by URIs: `uri1`, `uri2` and `uri3`.
+  <property>
+    <name>fs.viewfs.mounttable.global.linkNfly../ads</name>
+    <value>uri1,uri2,uri3</value>
+  </property>
+Note 2 consecutive `..` in the property name. They arise because of empty
+settings for advanced tweaking of the mount point which we will show in
+subsequent sections. The property value is a comma-separated list of URIs.
+URIs may point to different clusters in different regions
+`hdfs://datacenter-east/ads`, `s3a://models-us-west/ads`, 
+or in the simplest case to different directories under the same filesystem,
+e.g., `file:/tmp/ads1`, `file:/tmp/ads2`, `file:/tmp/ads3`
+All *modifications* performed under the global path `viewfs://global/ads` are
+propagated to all destination URIs if the underlying system is available.
+For instance if we create a file via hadoop shell
+hadoop fs -touchz viewfs://global/ads/z1
+We will find it via local filesystem in the latter configuration
+ls -al /tmp/ads*/z1
+-rw-r--r--  1 user  wheel  0 Mar 11 12:17 /tmp/ads1/z1
+-rw-r--r--  1 user  wheel  0 Mar 11 12:17 /tmp/ads2/z1
+-rw-r--r--  1 user  wheel  0 Mar 11 12:17 /tmp/ads3/z1
+A read from the global path is processed by the first filesystem that does not
+result in an exception. The order in which filesystems are accessed depends on
+whether they are available at this moment or and whether a topological order
+### Advanced Configuration
+Mount points `linkNfly` can be further configured using parameters passed as a
+comma-separated list of key=value pairs. Following parameters are currently
+`minReplication=int` determines the minimum number of destinations that have to
+process a write modification without exceptions, if below nfly write is failed.
+It is an configuration error to have minReplication higher than the number of
+target URIs. The default is 2.
+If minReplication is lower than the number of target URIs we may have some
+target URIs without latest writes. It can be compensated by employing more
+expensive read operations controlled by the following settings
+`readMostRecent=boolean` if set to `true` causes Nfly client to check the path
+under all target URIs instead of just the first one based on the topology 
+Among all available at the moment the one with the most recent modification 
+is processed.
+`repairOnRead=boolean` if set to `true` causes Nfly to copy most recent replica
+to stale targets such that subsequent reads can be done cheaply again from the
+closest replica.
+### Network Topology
+Nfly seeks to satisfy reads from the "closest" target URI.
+To this end, Nfly extends the notion of
+<a href="hadoop-project-dist/hadoop-common/RackAwareness.html">Rack 
+to the authorities of target URIs.
+Nfly applies NetworkTopology to resolve authorities of the URIs. Most commonly
+a script based mapping is used in a heterogeneous setup. We could have a script
+providing the following topology mapping
+| URI                           | Topology                 |
+|-------------------------------|------------------------- |
+| `hdfs://datacenter-east/ads`  | /us-east/onpremise-hdfs  |
+| `s3a://models-us-west/ads`    | /us-west/aws             |
+| `hdfs://datacenter-west/ads`  | /us-west/onpremise-hdfs  |
+If a target URI does not have the authority part as in `file:/` Nfly injects
+client's local node name.
+### Example Nfly Configuration
+  <property>
+  </property>
+### How Nfly File Creation works
+FileSystem fs = FileSystem.get("viewfs://global/", ...);
+FSDataOutputStream out = fs.create("viewfs://global/ads/f1");
+The code above would result in the following execution.
+1. create an invisible file `_nfly_tmp_f1` under each target URI i.e.,
`hdfs://datacenter-west/ads/_nfly_tmp_f1`, etc.
+This is done by calling `create` on underlying filesystems and returns a
+`FSDataOutputStream` object `out` that wraps all four output streams.
+2. Thus each subsequent write on `out` can be forwarded to each wrapped stream.
+3. On `out.close` all streams are closed, and the files are renamed from 
`_nfly_tmp_f1` to `f1`.
+All files receive the same *modification time* corresponding to the client
+system time as of beginning of this step.
+4. If at least `minReplication` destinations have gone through steps 1-3 
+failures Nfly considers the transaction logically committed; Otherwise it tries
+to clean up the temporary files in a best-effort attempt.
+Note that because 4 is a best-effort step and the client JVM could crash and 
+resume its work, it's a good idea to provision some sort of cron job to purge 
+`_nfly_tmp` files.
 ### FAQ
 1.  **As I move from non-federated world to the federated world, I will have 
to keep track of namenodes for different volumes; how do I do that?**

To unsubscribe, e-mail: common-commits-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-commits-h...@hadoop.apache.org

Reply via email to