[
https://issues.apache.org/jira/browse/HDFS-7385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14211239#comment-14211239
]
Hadoop QA commented on HDFS-7385:
---------------------------------
{color:red}-1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12681364/HDFS-7385.2.patch
against trunk revision 3651fe1.
{color:green}+1 @author{color}. The patch does not contain any @author
tags.
{color:green}+1 tests included{color}. The patch appears to include 1 new
or modified test files.
{color:green}+1 javac{color}. The applied patch does not increase the
total number of javac compiler warnings.
{color:green}+1 javadoc{color}. There were no new javadoc warning messages.
{color:green}+1 eclipse:eclipse{color}. The patch built with
eclipse:eclipse.
{color:green}+1 findbugs{color}. The patch does not introduce any new
Findbugs (version 2.0.3) warnings.
{color:green}+1 release audit{color}. The applied patch does not increase
the total number of release audit warnings.
{color:red}-1 core tests{color}. The patch failed these unit tests in
hadoop-hdfs-project/hadoop-hdfs:
org.apache.hadoop.hdfs.TestDFSUpgradeFromImage
org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives
The following test timeouts occurred in
hadoop-hdfs-project/hadoop-hdfs:
org.apache.hadoop.hdfs.util.TestByteArrayManager
{color:green}+1 contrib tests{color}. The patch passed contrib unit tests.
Test results:
https://builds.apache.org/job/PreCommit-HDFS-Build/8730//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8730//console
This message is automatically generated.
> ThreadLocal used in FSEditLog class lead FSImage permission mess up
> --------------------------------------------------------------------
>
> Key: HDFS-7385
> URL: https://issues.apache.org/jira/browse/HDFS-7385
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: namenode
> Affects Versions: 2.4.0, 2.5.0
> Reporter: jiangyu
> Assignee: jiangyu
> Priority: Blocker
> Attachments: HDFS-7385.2.patch, HDFS-7385.patch
>
>
> We migrated our NameNodes from low configuration to high configuration
> machines last week. Firstly,we imported the current directory including
> fsimage and editlog files from original ActiveNameNode to new ActiveNameNode
> and started the New NameNode, then changed the configuration of all
> datanodes and restarted all of datanodes , then blockreport to new NameNodes
> at once and send heartbeat after that.
> Everything seemed perfect, but after we restarted Resoucemanager ,
> most of the users compained that their jobs couldn't be executed for the
> reason of permission problem.
> We applied Acls in our clusters, and after migrated we found most of
> the directories and files which were not set Acls before now had the
> properties of Acls. That is the reason why users could not execute their
> jobs.So we had to change most of the files permission to a+r and directories
> permission to a+rx to make sure the jobs can be executed.
> After searching this problem for some days, i found there is a bug in
> FSEditLog.java. The ThreadLocal variable cache in FSEditLog don’t set the
> proper value in logMkdir and logOpenFile functions. Here is the code of
> logMkdir:
> public void logMkDir(String path, INode newNode) {
> PermissionStatus permissions = newNode.getPermissionStatus();
> MkdirOp op = MkdirOp.getInstance(cache.get())
> .setInodeId(newNode.getId())
> .setPath(path)
> .setTimestamp(newNode.getModificationTime())
> .setPermissionStatus(permissions);
> AclFeature f = newNode.getAclFeature();
> if (f != null) {
> op.setAclEntries(AclStorage.readINodeLogicalAcl(newNode));
> }
> logEdit(op);
> }
> For example, if we mkdir with Acls through one handler(Thread indeed),
> we set the AclEntries to the op from the cache. After that, if we mkdir
> without any Acls setting and set through the same handler, the AclEnties from
> the cache is the same with the last one which set the Acls, and because the
> newNode have no AclFeature, we don’t have any chance to change it. Then the
> editlog is wrong,record the wrong Acls. After the Standby load the editlogs
> from journalnodes and apply them to memory in SNN then savenamespace and
> transfer the wrong fsimage to ANN, all the fsimages get wrong. The only
> solution is to save namespace from ANN and you can get the right fsimage.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)