[
https://issues.apache.org/jira/browse/HDFS-7385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14209172#comment-14209172
]
Yi Liu commented on HDFS-7385:
------------------------------
[~jiangyu1211], {{OP_ADD}} is for create/append file, although you see the name
"logOpenFile"
Please add the test case as soon as possible, I will help to review and try to
push it into 2.6, since I think the issue is critical, although the fix is easy.
> ThreadLocal used in FSEditLog class lead FSImage permission mess up
> --------------------------------------------------------------------
>
> Key: HDFS-7385
> URL: https://issues.apache.org/jira/browse/HDFS-7385
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: namenode
> Affects Versions: 2.4.0, 2.5.0
> Reporter: jiangyu
> Assignee: jiangyu
> Attachments: HDFS-7385.patch
>
>
> We migrated our NameNodes from low configuration to high configuration
> machines last week. Firstly,we imported the current directory including
> fsimage and editlog files from original ActiveNameNode to new ActiveNameNode
> and started the New NameNode, then changed the configuration of all
> datanodes and restarted all of datanodes , then blockreport to new NameNodes
> at once and send heartbeat after that.
> Everything seemed perfect, but after we restarted Resoucemanager ,
> most of the users compained that their jobs couldn't be executed for the
> reason of permission problem.
> We applied Acls in our clusters, and after migrated we found most of
> the directories and files which were not set Acls before now had the
> properties of Acls. That is the reason why users could not execute their
> jobs.So we had to change most of the files permission to a+r and directories
> permission to a+rx to make sure the jobs can be executed.
> After searching this problem for some days, i found there is a bug in
> FSEditLog.java. The ThreadLocal variable cache in FSEditLog don’t set the
> proper value in logMkdir and logOpenFile functions. Here is the code of
> logMkdir:
> public void logMkDir(String path, INode newNode) {
> PermissionStatus permissions = newNode.getPermissionStatus();
> MkdirOp op = MkdirOp.getInstance(cache.get())
> .setInodeId(newNode.getId())
> .setPath(path)
> .setTimestamp(newNode.getModificationTime())
> .setPermissionStatus(permissions);
> AclFeature f = newNode.getAclFeature();
> if (f != null) {
> op.setAclEntries(AclStorage.readINodeLogicalAcl(newNode));
> }
> logEdit(op);
> }
> For example, if we mkdir with Acls through one handler(Thread indeed),
> we set the AclEntries to the op from the cache. After that, if we mkdir
> without any Acls setting and set through the same handler, the AclEnties from
> the cache is the same with the last one which set the Acls, and because the
> newNode have no AclFeature, we don’t have any chance to change it. Then the
> editlog is wrong,record the wrong Acls. After the Standby load the editlogs
> from journalnodes and apply them to memory in SNN then savenamespace and
> transfer the wrong fsimage to ANN, all the fsimages get wrong. The only
> solution is to save namespace from ANN and you can get the right fsimage.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)