[
https://issues.apache.org/jira/browse/HDFS-5846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13887725#comment-13887725
]
Hadoop QA commented on HDFS-5846:
---------------------------------
{color:red}-1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12626286/hdfs-5846.patch
against trunk revision .
{color:green}+1 @author{color}. The patch does not contain any @author
tags.
{color:red}-1 tests included{color}. The patch doesn't appear to include
any new or modified tests.
Please justify why no new tests are needed for this
patch.
Also please list what manual steps were performed to
verify this patch.
{color:green}+1 javac{color}. The applied patch does not increase the
total number of javac compiler warnings.
{color:green}+1 javadoc{color}. The javadoc tool did not generate any
warning messages.
{color:green}+1 eclipse:eclipse{color}. The patch built with
eclipse:eclipse.
{color:green}+1 findbugs{color}. The patch does not introduce any new
Findbugs (version 1.3.9) warnings.
{color:green}+1 release audit{color}. The applied patch does not increase
the total number of release audit warnings.
{color:green}+1 core tests{color}. The patch passed unit tests in
hadoop-hdfs-project/hadoop-hdfs.
{color:green}+1 contrib tests{color}. The patch passed contrib unit tests.
Test results:
https://builds.apache.org/job/PreCommit-HDFS-Build/5999//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5999//console
This message is automatically generated.
> Assigning DEFAULT_RACK in resolveNetworkLocation method can break data
> resiliency
> ---------------------------------------------------------------------------------
>
> Key: HDFS-5846
> URL: https://issues.apache.org/jira/browse/HDFS-5846
> Project: Hadoop HDFS
> Issue Type: Bug
> Reporter: Nikola Vujic
> Assignee: Nikola Vujic
> Attachments: hdfs-5846.patch
>
>
> Medhod CachedDNSToSwitchMapping::resolve() can return NULL which requires
> careful handling. Null can be returned in two cases:
> • An error occurred with topology script execution (script crashes).
> • Script returns wrong number of values (other than expected)
> Critical handling is in the DN registration code. DN registration code is
> responsible for assigning proper topology paths to all registered datanodes.
> Existing code handles this NULL pointer on the following way
> ({{resolveNetworkLocation}} method):
> {code}
> / /resolve its network location
> List<String> rName = dnsToSwitchMapping.resolve(names);
> String networkLocation;
> if (rName == null) {
> LOG.error("The resolve call returned null! Using " +
> NetworkTopology.DEFAULT_RACK + " for host " + names);
> networkLocation = NetworkTopology.DEFAULT_RACK;
> } else {
> networkLocation = rName.get(0);
> }
> return networkLocation;
> {code}
> The line of code that is assigning default rack:
> {code} networkLocation = NetworkTopology.DEFAULT_RACK; {code}
> can cause a serious problem. This means if somehow we got NULL, then the
> default rack will be assigned as a DN's network location and DN's
> registration will finish successfully. Under this circumstances, we will be
> able to load data into cluster which is working with a wrong topology. Wrong
> topology means that fault domains are not honored.
> For the end user, it means that two data replicas can end up in the same
> fault domain and a single failure can cause loss of two, or more, replicas.
> Cluster would be in the inconsistent state but it would not be aware of that
> and the whole thing would work as if everything was fine. We can notice that
> something wrong happened almost only by looking in the log for the error:
> {code}
> LOG.error("The resolve call returned null! Using " +
> NetworkTopology.DEFAULT_RACK + " for host " + names);
> {code}
>
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)