[
https://issues.apache.org/jira/browse/HDFS-4482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13754609#comment-13754609
]
Hudson commented on HDFS-4482:
------------------------------
FAILURE: Integrated in Hadoop-Hdfs-0.23-Build #715 (See
[https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/715/])
HDFS-4482. ReplicationMonitor thread can exit with NPE due to the race between
delete and replication of same file. Contributed by Uma Maheswara Rao G.
(kihwal:
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1518834)
*
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
*
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java
*
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/INode.java
> ReplicationMonitor thread can exit with NPE due to the race between delete
> and replication of same file.
> --------------------------------------------------------------------------------------------------------
>
> Key: HDFS-4482
> URL: https://issues.apache.org/jira/browse/HDFS-4482
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: namenode
> Affects Versions: 3.0.0, 2.0.1-alpha
> Reporter: Uma Maheswara Rao G
> Assignee: Uma Maheswara Rao G
> Priority: Blocker
> Fix For: 3.0.0, 2.0.5-alpha, 0.23.10
>
> Attachments: HDFS-4482-1.patch, HDFS-4482.patch, HDFS-4482.patch
>
>
> Trace:
> {noformat}
> java.lang.NullPointerException
> at
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.getFullPathName(FSDirectory.java:1442)
> at
> org.apache.hadoop.hdfs.server.namenode.INode.getFullPathName(INode.java:269)
> at
> org.apache.hadoop.hdfs.server.namenode.INodeFile.getName(INodeFile.java:163)
> at
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy.chooseTarget(BlockPlacementPolicy.java:131)
> at
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWorkForBlocks(BlockManager.java:1157)
> at
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWork(BlockManager.java:1063)
> at
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:3085)
> at
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:3047)
> at java.lang.Thread.run(Thread.java:619)
> {noformat}
> What I am seeing here is:
> 1) create a file and write with 2 DNS
> 2) Close the file.
> 3) Kill one DN
> 4) Let replication start.
> Info:
> {code}
> // choose replication targets: NOT HOLDING THE GLOBAL LOCK
> // It is costly to extract the filename for which chooseTargets is
> called,
> // so for now we pass in the block collection itself.
> rw.targets = blockplacement.chooseTarget(rw.bc,
> rw.additionalReplRequired, rw.srcNode, rw.liveReplicaNodes,
> excludedNodes, rw.block.getNumBytes());{code}
> Here we are choosing target outside the global lock. Inside we will try to
> get the src path from blockCollection(nothing but INodeFile here).
> see the code for FSDirectory#getFullPathName
> Here it is incrementing the depth until it has parent. and Later it will
> iterate and access parent again in next loop.
> 5) before going to secnd loop in FSDirectory#getFullPathName, if file is
> deleted by client then that parent would have been set as null. So, here
> accessing the parent can cause NPE because it is not under lock.
> [~brahmareddy] reported this issue.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira