[
https://issues.apache.org/jira/browse/HADOOP-12077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14628644#comment-14628644
]
Gera Shegalov commented on HADOOP-12077:
----------------------------------------
Since FindBugs -1 report does not actually report anything wrong, I consider
004 reviewable.
> Provide a muti-URI replication Inode for ViewFs
> -----------------------------------------------
>
> Key: HADOOP-12077
> URL: https://issues.apache.org/jira/browse/HADOOP-12077
> Project: Hadoop Common
> Issue Type: New Feature
> Components: fs
> Reporter: Gera Shegalov
> Assignee: Gera Shegalov
> Attachments: HADOOP-12077.001.patch, HADOOP-12077.002.patch,
> HADOOP-12077.003.patch, HADOOP-12077.004.patch
>
>
> This JIRA is to provide simple "replication" capabilities for applications
> that maintain logically equivalent paths in multiple locations for caching or
> failover (e.g., S3 and HDFS). We noticed a simple common HDFS usage pattern
> in our applications. They host their data on some logical cluster C. There
> are corresponding HDFS clusters in multiple datacenters. When the application
> runs in DC1, it prefers to read from C in DC1, and the applications prefers
> to failover to C in DC2 if the application is migrated to DC2 or when C in
> DC1 is unavailable. New application data versions are created
> periodically/relatively infrequently.
> In order to address many common scenarios in a general fashion, and to avoid
> unnecessary code duplication, we implement this functionality in ViewFs (our
> default FileSystem spanning all clusters in all datacenters) in a project
> code-named Nfly (N as in N datacenters). Currently each ViewFs Inode points
> to a single URI via ChRootedFileSystem. Consequently, we introduce a new type
> of links that points to a list of URIs that are each going to be wrapped in
> ChRootedFileSystem. A typical usage:
> /nfly/C/user->/DC1/C/user,/DC2/C/user,... This collection of
> ChRootedFileSystem instances is fronted by the Nfly filesystem object that is
> actually used for the mount point/Inode. Nfly filesystems backs a single
> logical path /nfly/C/user/<user>/path by multiple physical paths.
> Nfly filesystem supports setting minReplication. As long as the number of
> URIs on which an update has succeeded is greater than or equal to
> minReplication exceptions are only logged but not thrown. Each update
> operation is currently executed serially (client-bandwidth driven parallelism
> will be added later).
> A file create/write:
> # Creates a temporary invisible _nfly_tmp_file in the intended chrooted
> filesystem.
> # Returns a FSDataOutputStream that wraps output streams returned by 1
> # All writes are forwarded to each output stream.
> # On close of stream created by 2, all n streams are closed, and the files
> are renamed from _nfly_tmp_file to file. All files receive the same mtime
> corresponding to the client system time as of beginning of this step.
> # If at least minReplication destinations has gone through steps 1-4 without
> failures the transaction is considered logically committed, otherwise a
> best-effort attempt of cleaning up the temporary files is attempted.
> As for reads, we support a notion of locality similar to HDFS /DC/rack/node.
> We sort Inode URIs using NetworkTopology by their authorities. These are
> typically host names in simple HDFS URIs. If the authority is missing as is
> the case with the local file:/// the local host name is assumed
> InetAddress.getLocalHost(). This makes sure that the local file system is
> always the closest one to the reader in this approach. For our Hadoop 2 hdfs
> URIs that are based on nameservice ids instead of hostnames it is very easy
> to adjust the topology script since our nameservice ids already contain the
> datacenter. As for rack and node we can simply output any string such as
> /DC/rack-nsid/node-nsid, since we only care about datacenter-locality for
> such filesystem clients.
> There are 2 policies/additions to the read call path that makes it more
> expensive, but improve user experience:
> - readMostRecent - when this policy is enabled, Nfly first checks mtime for
> the path under all URIs, sorts them from most recent to least recent. Nfly
> then sorts the set of most recent URIs topologically in the same manner as
> described above.
> - repairOnRead - when readMostRecent is enabled Nfly already has to RPC all
> underlying destinations. With repairOnRead, Nfly filesystem would
> additionally attempt to refresh destinations with the path missing or a stale
> version of the path using the nearest available most recent destination.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)