[
https://issues.apache.org/jira/browse/HDFS-14503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Toshihiro Suzuki resolved HDFS-14503.
-------------------------------------
Resolution: Duplicate
> ThrottledAsyncChecker throws NPE during block pool initialization
> ------------------------------------------------------------------
>
> Key: HDFS-14503
> URL: https://issues.apache.org/jira/browse/HDFS-14503
> Project: Hadoop HDFS
> Issue Type: Bug
> Affects Versions: 3.3.0
> Reporter: Yiqun Lin
> Priority: Major
>
> ThrottledAsyncChecker throws NPE during block pool initialization. The error
> leads the block pool registration failure.
> The exception
> {noformat}
> 2019-05-20 01:02:36,003 WARN org.apache.hadoop.hdfs.server.datanode.DataNode:
> Unexpected exception in block pool Block pool <registering> (Datanode Uuid
> xxxxx) service to xx.xx.xx.xx/xx.xx.xx.xx
> java.lang.NullPointerException
> at
> org.apache.hadoop.hdfs.server.datanode.checker.ThrottledAsyncChecker$LastCheckResult.access$000(ThrottledAsyncChecker.java:211)
> at
> org.apache.hadoop.hdfs.server.datanode.checker.ThrottledAsyncChecker.schedule(ThrottledAsyncChecker.java:129)
> at
> org.apache.hadoop.hdfs.server.datanode.checker.DatasetVolumeChecker.checkAllVolumes(DatasetVolumeChecker.java:209)
> at
> org.apache.hadoop.hdfs.server.datanode.DataNode.checkDiskError(DataNode.java:3387)
> at
> org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1508)
> at
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:319)
> at
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:272)
> at
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:768)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> Looks like this error due to {{WeakHashMap}} type map {{completedChecks}} has
> removed the target entry while we still get that entry. Although we have done
> a check before we get it, there is still a chance the entry is got as null.
> We met a corner case for this: A federation mode, two block pools in DN,
> {{ThrottledAsyncChecker}} schedules two same health checks for same volume.
> {noformat}
> 2019-05-20 01:02:36,000 INFO
> org.apache.hadoop.hdfs.server.datanode.checker.ThrottledAsyncChecker:
> Scheduling a check for /hadoop/2/hdfs/data/current
> 2019-05-20 01:02:36,000 INFO
> org.apache.hadoop.hdfs.server.datanode.checker.ThrottledAsyncChecker:
> Scheduling a check for /hadoop/2/hdfs/data/current
> {noformat}
> {{completedChecks}} cleans up the entry for one successful check after called
> {{completedChecks#get}}. However, after this, another check we get the null.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]