[ 
https://issues.apache.org/jira/browse/HDFS-10729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15422827#comment-15422827
 ] 

Kihwal Lee commented on HDFS-10729:
-----------------------------------

I just hit the submit patch button.

> NameNode crashes when loading edits because max directory items is exceeded
> ---------------------------------------------------------------------------
>
>                 Key: HDFS-10729
>                 URL: https://issues.apache.org/jira/browse/HDFS-10729
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 2.7.2
>            Reporter: Wei-Chiu Chuang
>            Assignee: Wei-Chiu Chuang
>            Priority: Critical
>         Attachments: HDFS-10729.001.patch
>
>
> We encountered a bug where Standby NameNode crashes due to an NPE when 
> loading edits.
> {noformat}
> 2016-08-05 15:06:00,983 ERROR 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered exception 
> on operation AddOp [length=0, inodeId=789272719, path=[path], replication=3, 
> mtime=1470379597935, atime=1470379597935, blockSize=134217728, blocks=[], 
> permissions=<user>:supergroup:rw-r--r--, aclEntries=null, 
> clientName=DFSClient_NONMAPREDUCE_1495395702_1, clientMachine=10.210.119.136, 
> overwrite=true, RpcClientId=a1512eeb-65e4-43dc-8aa8-d7a1af37ed30, 
> RpcCallId=417, storagePolicyId=0, opCode=OP_ADD, txid=4212503758]
> java.lang.NullPointerException
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.getFileEncryptionInfo(FSDirectory.java:2914)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.createFileStatus(FSDirectory.java:2469)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:375)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:230)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:139)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:829)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:810)
>         at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:232)
>         at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:331)
>         at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:284)
>         at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:301)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:360)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1651)
>         at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:410)
>         at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:297)
> {noformat}
> The NameNode crashes and can not be restarted. After some research, we turned 
> on debug log of org.apache.hadoop.hdfs.StateChange, restart the NN, and we 
> saw the following exception which induced NPE:
> {noformat}
> 16/08/07 18:51:15 DEBUG hdfs.StateChange: DIR* 
> FSDirectory.unprotectedAddFile: exception when add [path] to the file system
> org.apache.hadoop.hdfs.protocol.FSLimitException$MaxDirectoryItemsExceededException:
>  The directory item limit of [path] is exceeded: limit=1048576 items=1049332
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyMaxDirItems(FSDirectory.java:2060)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:2112)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.addLastINode(FSDirectory.java:2081)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.addINode(FSDirectory.java:1900)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedAddFile(FSDirectory.java:368)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:365)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:230)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:139)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:829)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:810)
>         at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:232)
>         at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$1.run(EditLogTailer.java:188)
>         at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$1.run(EditLogTailer.java:182)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:415)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
>         at 
> org.apache.hadoop.security.SecurityUtil.doAsUser(SecurityUtil.java:445)
>         at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUser(SecurityUtil.java:426)
>         at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.catchupDuringFailover(EditLogTailer.java:182)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startActiveServices(FSNamesystem.java:1205)
>         at 
> org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.startActiveServices(NameNode.java:1762)
>         at 
> org.apache.hadoop.hdfs.server.namenode.ha.ActiveState.enterState(ActiveState.java:61)
>         at 
> org.apache.hadoop.hdfs.server.namenode.ha.HAState.setStateInternal(HAState.java:64)
>         at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.setState(StandbyState.java:49)
>         at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.transitionToActive(NameNode.java:1635)
>         at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.transitionToActive(NameNodeRpcServer.java:1351)
>         at 
> org.apache.hadoop.ha.protocolPB.HAServiceProtocolServerSideTranslatorPB.transitionToActive(HAServiceProtocolServerSideTranslatorPB.java:107)
>         at 
> org.apache.hadoop.ha.proto.HAServiceProtocolProtos$HAServiceProtocolService$2.callBlockingMethod(HAServiceProtocolProtos.java:4460)
>         at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:415)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)
> {noformat}
> The exception is thrown, caught and logged at debug level in 
> {{FSDirectory#unprotectedAddFile}}. Afterwards, a null is returned, which is 
> used and subsequently caused NPE. HDFS-7567 reported a similar bug, but it 
> was deemed the NPE could only be thrown if edit is corrupt. However, here we 
> see an example where NPE could be thrown without corrupt edits.
> Looks like the maximum number of items per directory is exceeded. This is 
> similar to HDFS-6102 and HDFS-7482, but this happens at loading edits, rather 
> than loading fsimage. 
> A possible workaround is to increas the value of 
> {{dfs.namenode.fs-limits.max-directory-items}} to 6400000. But I am not sure 
> if it would cause any side effects.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to