[jira] Commented: (HDFS-165) NPE in datanode.handshake()

Steve Loughran (JIRA) Sat, 09 Jan 2010 05:43:19 -0800

    [ 
https://issues.apache.org/jira/browse/HDFS-165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12798350#action_12798350
 ]


Steve Loughran commented on HDFS-165:
-------------------------------------

Interesting. A test failed. 
{{org.apache.hadoop.hdfs.server.namenode.TestNodeCount.testNodeCount }}

{code}
java.lang.NullPointerException
        at 
org.apache.hadoop.hdfs.server.namenode.BlockManager.countNodes(BlockManager.java:1395)
        at 
org.apache.hadoop.hdfs.server.namenode.TestNodeCount.testNodeCount(TestNodeCount.java:119)
{code}

The line in question is looking at nodes in the blocks map, and is failing 
because one of the nodes in the map is null. 
{code}
    Iterator<DatanodeDescriptor> nodeIter = blocksMap.nodeIterator(b);
    Collection<DatanodeDescriptor> nodesCorrupt = corruptReplicas.getNodes(b);
    while (nodeIter.hasNext()) {
      DatanodeDescriptor node = nodeIter.next();
      if ((nodesCorrupt != null) && (nodesCorrupt.contains(node))) {
        corrupt++;
      } else if (node.isDecommissionInProgress() || node.isDecommissioned()) {  
 //HERE
        count++;
      } else {
        Collection<Block> blocksExcess =
          excessReplicateMap.get(node.getStorageID());
        if (blocksExcess != null && blocksExcess.contains(b)) {
          excess++;
        } else {
          live++;
        }
{code}

I'll do some work to see if this can be replicated, but if not, it could be 
some race condition that hudson is seeing -datanode setup is failing and the 
blocksmap is containing a null entry. Which is a separate problem, one we've 
now found

> NPE in datanode.handshake()
> ---------------------------
>
>                 Key: HDFS-165
>                 URL: https://issues.apache.org/jira/browse/HDFS-165
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 0.20.1
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>            Priority: Minor
>             Fix For: 0.22.0
>
>         Attachments: HDFS-165.patch
>
>
> It appears possible to raise an NPE in DataNode.handshake() if the startup 
> protocol gets interrupted or fails in some manner

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HDFS-165) NPE in datanode.handshake()

Reply via email to