[ 
https://issues.apache.org/jira/browse/HDFS-6148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13946763#comment-13946763
 ] 

Kihwal Lee commented on HDFS-6148:
----------------------------------

This happens when only NN restarts and an incremental block report is received 
after the node registration, but before adding the storage. I.e. queued 
incremental block report coming first before the first heartbeat.  In this 
case, {{BlockInfoUnderConstruction#addReplicaIfNotPresent()}} is called from 
{{addStoredBlockUnderConstruction()}}, but the {{StorageInfo}} is null. Since 
the storage is not added yet, {{node.getStorageInfo(storageID)}} is null. As a 
result, the {{BlockInfoUnderConstruction}} will have one 
{{ReplicaUnderConstruction}} with its expectedLocation set to null.   This is 
apparent from the log message from the processing of such an incremental block 
report.

{noformat}
WARN BlockStateChange: BLOCK* addStoredBlock: Redundant addStoredBlock request 
received for 
blk_1089713407_xxxxx{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
replicas=[ReplicaUnderConstruction[null|FINALIZED]]} on 1.2.3.4:1004 size 0
{noformat}

After this, fsck will fail with a NPE and the LeaseManager will also crash with 
a NPE.

> LeaseManager crashes while initiating block recovery
> ----------------------------------------------------
>
>                 Key: HDFS-6148
>                 URL: https://issues.apache.org/jira/browse/HDFS-6148
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.3.0
>            Reporter: Kihwal Lee
>            Priority: Blocker
>
> While running branch-2.4, the LeaseManager crashed with an NPE. This does not 
> always happen on block recovery.
> {panel}
> Exception in thread
> "org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor@5d66b728"
> java.lang.NullPointerException
>         at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoUnderConstruction$
>             
> ReplicaUnderConstruction.isAlive(BlockInfoUnderConstruction.java:121)
>         at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoUnderConstruction.
>             initializeBlockRecovery(BlockInfoUnderConstruction.java:286)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.internalReleaseLease(FSNamesystem.java:3746)
>         at 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager.checkLeases(LeaseManager.java:474)
>         at 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager.access$900(LeaseManager.java:68)
>         at 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor.run(LeaseManager.java:411)
>         at java.lang.Thread.run(Thread.java:722)
> {panel}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to