[
https://issues.apache.org/jira/browse/HADOOP-12077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14623728#comment-14623728
]
Hadoop QA commented on HADOOP-12077:
------------------------------------
\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch | 17m 1s | Findbugs (version ) appears to
be broken on trunk. |
| {color:green}+1{color} | @author | 0m 0s | The patch does not contain any
@author tags. |
| {color:green}+1{color} | tests included | 0m 0s | The patch appears to
include 3 new or modified test files. |
| {color:green}+1{color} | javac | 7m 40s | There were no new javac warning
messages. |
| {color:green}+1{color} | javadoc | 9m 37s | There were no new javadoc
warning messages. |
| {color:green}+1{color} | release audit | 0m 21s | The applied patch does
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle | 1m 50s | There were no new checkstyle
issues. |
| {color:green}+1{color} | whitespace | 0m 2s | The patch has no lines that
end in whitespace. |
| {color:green}+1{color} | install | 1m 31s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with
eclipse:eclipse. |
| {color:red}-1{color} | findbugs | 4m 21s | The patch appears to introduce 3
new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | common tests | 22m 27s | Tests passed in
hadoop-common. |
| {color:green}+1{color} | hdfs tests | 158m 42s | Tests passed in hadoop-hdfs.
|
| | | 224m 8s | |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-common |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL |
http://issues.apache.org/jira/secure/attachment/12744920/HADOOP-12077.004.patch
|
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / d7319de |
| Findbugs warnings |
https://builds.apache.org/job/PreCommit-HADOOP-Build/7252/artifact/patchprocess/newPatchFindbugsWarningshadoop-common.html
|
| hadoop-common test log |
https://builds.apache.org/job/PreCommit-HADOOP-Build/7252/artifact/patchprocess/testrun_hadoop-common.txt
|
| hadoop-hdfs test log |
https://builds.apache.org/job/PreCommit-HADOOP-Build/7252/artifact/patchprocess/testrun_hadoop-hdfs.txt
|
| Test Results |
https://builds.apache.org/job/PreCommit-HADOOP-Build/7252/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf901.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output |
https://builds.apache.org/job/PreCommit-HADOOP-Build/7252/console |
This message was automatically generated.
> Provide a muti-URI replication Inode for ViewFs
> -----------------------------------------------
>
> Key: HADOOP-12077
> URL: https://issues.apache.org/jira/browse/HADOOP-12077
> Project: Hadoop Common
> Issue Type: New Feature
> Components: fs
> Reporter: Gera Shegalov
> Assignee: Gera Shegalov
> Attachments: HADOOP-12077.001.patch, HADOOP-12077.002.patch,
> HADOOP-12077.003.patch, HADOOP-12077.004.patch
>
>
> This JIRA is to provide simple "replication" capabilities for applications
> that maintain logically equivalent paths in multiple locations for caching or
> failover (e.g., S3 and HDFS). We noticed a simple common HDFS usage pattern
> in our applications. They host their data on some logical cluster C. There
> are corresponding HDFS clusters in multiple datacenters. When the application
> runs in DC1, it prefers to read from C in DC1, and the applications prefers
> to failover to C in DC2 if the application is migrated to DC2 or when C in
> DC1 is unavailable. New application data versions are created
> periodically/relatively infrequently.
> In order to address many common scenarios in a general fashion, and to avoid
> unnecessary code duplication, we implement this functionality in ViewFs (our
> default FileSystem spanning all clusters in all datacenters) in a project
> code-named Nfly (N as in N datacenters). Currently each ViewFs Inode points
> to a single URI via ChRootedFileSystem. Consequently, we introduce a new type
> of links that points to a list of URIs that are each going to be wrapped in
> ChRootedFileSystem. A typical usage:
> /nfly/C/user->/DC1/C/user,/DC2/C/user,... This collection of
> ChRootedFileSystem instances is fronted by the Nfly filesystem object that is
> actually used for the mount point/Inode. Nfly filesystems backs a single
> logical path /nfly/C/user/<user>/path by multiple physical paths.
> Nfly filesystem supports setting minReplication. As long as the number of
> URIs on which an update has succeeded is greater than or equal to
> minReplication exceptions are only logged but not thrown. Each update
> operation is currently executed serially (client-bandwidth driven parallelism
> will be added later).
> A file create/write:
> # Creates a temporary invisible _nfly_tmp_file in the intended chrooted
> filesystem.
> # Returns a FSDataOutputStream that wraps output streams returned by 1
> # All writes are forwarded to each output stream.
> # On close of stream created by 2, all n streams are closed, and the files
> are renamed from _nfly_tmp_file to file. All files receive the same mtime
> corresponding to the client system time as of beginning of this step.
> # If at least minReplication destinations has gone through steps 1-4 without
> failures the transaction is considered logically committed, otherwise a
> best-effort attempt of cleaning up the temporary files is attempted.
> As for reads, we support a notion of locality similar to HDFS /DC/rack/node.
> We sort Inode URIs using NetworkTopology by their authorities. These are
> typically host names in simple HDFS URIs. If the authority is missing as is
> the case with the local file:/// the local host name is assumed
> InetAddress.getLocalHost(). This makes sure that the local file system is
> always the closest one to the reader in this approach. For our Hadoop 2 hdfs
> URIs that are based on nameservice ids instead of hostnames it is very easy
> to adjust the topology script since our nameservice ids already contain the
> datacenter. As for rack and node we can simply output any string such as
> /DC/rack-nsid/node-nsid, since we only care about datacenter-locality for
> such filesystem clients.
> There are 2 policies/additions to the read call path that makes it more
> expensive, but improve user experience:
> - readMostRecent - when this policy is enabled, Nfly first checks mtime for
> the path under all URIs, sorts them from most recent to least recent. Nfly
> then sorts the set of most recent URIs topologically in the same manner as
> described above.
> - repairOnRead - when readMostRecent is enabled Nfly already has to RPC all
> underlying destinations. With repairOnRead, Nfly filesystem would
> additionally attempt to refresh destinations with the path missing or a stale
> version of the path using the nearest available most recent destination.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)