[ 
https://issues.apache.org/jira/browse/HDFS-7385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14211239#comment-14211239
 ] 

Hadoop QA commented on HDFS-7385:
---------------------------------

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12681364/HDFS-7385.2.patch
  against trunk revision 3651fe1.

    {color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

    {color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

    {color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

    {color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

    {color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

    {color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

    {color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

    {color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

                  org.apache.hadoop.hdfs.TestDFSUpgradeFromImage
                  org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives

                                      The following test timeouts occurred in 
hadoop-hdfs-project/hadoop-hdfs:

org.apache.hadoop.hdfs.util.TestByteArrayManager

    {color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8730//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8730//console

This message is automatically generated.

> ThreadLocal used in FSEditLog class  lead FSImage permission mess up
> --------------------------------------------------------------------
>
>                 Key: HDFS-7385
>                 URL: https://issues.apache.org/jira/browse/HDFS-7385
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 2.4.0, 2.5.0
>            Reporter: jiangyu
>            Assignee: jiangyu
>            Priority: Blocker
>         Attachments: HDFS-7385.2.patch, HDFS-7385.patch
>
>
>       We migrated our NameNodes from low configuration to high configuration 
> machines last week. Firstly,we  imported the current directory including 
> fsimage and editlog files from original ActiveNameNode to new ActiveNameNode 
> and started the New NameNode, then  changed the configuration of all 
> datanodes and restarted all of datanodes , then blockreport to new NameNodes 
> at once and send heartbeat after that.
>        Everything seemed perfect, but after we restarted Resoucemanager , 
> most of the users compained that their jobs couldn't be executed for the 
> reason of permission problem.
>       We applied Acls in our clusters, and after migrated we found most of 
> the directories and files which were not set Acls before now had the 
> properties of Acls. That is the reason why users could not execute their 
> jobs.So we had to change most of the files permission to a+r and directories 
> permission to a+rx to make sure the jobs can be executed.
> After searching this problem for some days, i found there is a bug in 
> FSEditLog.java. The ThreadLocal variable cache in FSEditLog don’t set the 
> proper value in logMkdir and logOpenFile functions. Here is the code of 
> logMkdir:
>   public void logMkDir(String path, INode newNode) {
>     PermissionStatus permissions = newNode.getPermissionStatus();
>     MkdirOp op = MkdirOp.getInstance(cache.get())
>       .setInodeId(newNode.getId())
>       .setPath(path)
>       .setTimestamp(newNode.getModificationTime())
>       .setPermissionStatus(permissions);
>     AclFeature f = newNode.getAclFeature();
>     if (f != null) {
>       op.setAclEntries(AclStorage.readINodeLogicalAcl(newNode));
>     }
>     logEdit(op);
>   }
>       For example, if we mkdir with Acls through one handler(Thread indeed), 
> we set the AclEntries to the op from the cache. After that, if we mkdir 
> without any Acls setting and set through the same handler, the AclEnties from 
> the cache is the same with the last one which set the Acls, and because the 
> newNode have no AclFeature, we don’t have any chance to change it. Then the 
> editlog is wrong,record the wrong Acls. After the Standby load the editlogs 
> from journalnodes and  apply them to memory in SNN then savenamespace and 
> transfer the wrong fsimage to ANN, all the fsimages get wrong. The only 
> solution is to save namespace from ANN and you can get the right fsimage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to