[jira] [Commented] (HADOOP-12077) Provide a muti-URI replication Inode for ViewFs

Hadoop QA (JIRA) Sun, 12 Jul 2015 03:05:40 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-12077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14623728#comment-14623728
 ]


Hadoop QA commented on HADOOP-12077:
------------------------------------

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  17m  1s | Findbugs (version ) appears to 
be broken on trunk. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 3 new or modified test files. |
| {color:green}+1{color} | javac |   7m 40s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 37s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 21s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   1m 50s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  2s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 31s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 32s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   4m 21s | The patch appears to introduce 3 
new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | common tests |  22m 27s | Tests passed in 
hadoop-common. |
| {color:green}+1{color} | hdfs tests | 158m 42s | Tests passed in hadoop-hdfs. 
|
| | | 224m  8s | |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-common |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12744920/HADOOP-12077.004.patch 
|
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / d7319de |
| Findbugs warnings | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/7252/artifact/patchprocess/newPatchFindbugsWarningshadoop-common.html
 |
| hadoop-common test log | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/7252/artifact/patchprocess/testrun_hadoop-common.txt
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/7252/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/7252/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf901.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/7252/console |


This message was automatically generated.

> Provide a muti-URI replication Inode for ViewFs
> -----------------------------------------------
>
>                 Key: HADOOP-12077
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12077
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: fs
>            Reporter: Gera Shegalov
>            Assignee: Gera Shegalov
>         Attachments: HADOOP-12077.001.patch, HADOOP-12077.002.patch, 
> HADOOP-12077.003.patch, HADOOP-12077.004.patch
>
>
> This JIRA is to provide simple "replication" capabilities for applications 
> that maintain logically equivalent paths in multiple locations for caching or 
> failover (e.g., S3 and HDFS). We noticed a simple common HDFS usage pattern 
> in our applications. They host their data on some logical cluster C. There 
> are corresponding HDFS clusters in multiple datacenters. When the application 
> runs in DC1, it prefers to read from C in DC1, and the applications prefers 
> to failover to C in DC2 if the application is migrated to DC2 or when C in 
> DC1 is unavailable. New application data versions are created 
> periodically/relatively infrequently. 
> In order to address many common scenarios in a general fashion, and to avoid 
> unnecessary code duplication, we implement this functionality in ViewFs (our 
> default FileSystem spanning all clusters in all datacenters) in a project 
> code-named Nfly (N as in N datacenters). Currently each ViewFs Inode points 
> to a single URI via ChRootedFileSystem. Consequently, we introduce a new type 
> of links that points to a list of URIs that are each going to be wrapped in 
> ChRootedFileSystem. A typical usage: 
> /nfly/C/user->/DC1/C/user,/DC2/C/user,... This collection of 
> ChRootedFileSystem instances is fronted by the Nfly filesystem object that is 
> actually used for the mount point/Inode. Nfly filesystems backs a single 
> logical path /nfly/C/user/<user>/path by multiple physical paths.
> Nfly filesystem supports setting minReplication. As long as the number of 
> URIs on which an update has succeeded is greater than or equal to 
> minReplication exceptions are only logged but not thrown. Each update 
> operation is currently executed serially (client-bandwidth driven parallelism 
> will be added later). 
> A file create/write: 
> # Creates a temporary invisible _nfly_tmp_file in the intended chrooted 
> filesystem. 
> # Returns a FSDataOutputStream that wraps output streams returned by 1
> # All writes are forwarded to each output stream.
> # On close of stream created by 2, all n streams are closed, and the files 
> are renamed from _nfly_tmp_file to file. All files receive the same mtime 
> corresponding to the client system time as of beginning of this step. 
> # If at least minReplication destinations has gone through steps 1-4 without 
> failures the transaction is considered logically committed, otherwise a 
> best-effort attempt of cleaning up the temporary files is attempted.
> As for reads, we support a notion of locality similar to HDFS  /DC/rack/node. 
> We sort Inode URIs using NetworkTopology by their authorities. These are 
> typically host names in simple HDFS URIs. If the authority is missing as is 
> the case with the local file:/// the local host name is assumed 
> InetAddress.getLocalHost(). This makes sure that the local file system is 
> always the closest one to the reader in this approach. For our Hadoop 2 hdfs 
> URIs that are based on nameservice ids instead of hostnames it is very easy 
> to adjust the topology script since our nameservice ids already contain the 
> datacenter. As for rack and node we can simply output any string such as 
> /DC/rack-nsid/node-nsid, since we only care about datacenter-locality for 
> such filesystem clients.
> There are 2 policies/additions to the read call path that makes it more 
> expensive, but improve user experience:
> - readMostRecent - when this policy is enabled, Nfly first checks mtime for 
> the path under all URIs, sorts them from most recent to least recent. Nfly 
> then sorts the set of most recent URIs topologically in the same manner as 
> described above.
> - repairOnRead - when readMostRecent is enabled Nfly already has to RPC all 
> underlying destinations. With repairOnRead, Nfly filesystem would 
> additionally attempt to refresh destinations with the path missing or a stale 
> version of the path using the nearest available most recent destination. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-12077) Provide a muti-URI replication Inode for ViewFs

Reply via email to