[
https://issues.apache.org/jira/browse/HDFS-5185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vinayakumar B updated HDFS-5185:
--------------------------------
Attachment: HDFS-5185-002.patch
Attaching the updated patch.
after recent changes {{checkDiskError()}} will trigger one periodic thread
which will check for disk error asynchronously. But this issue requires
synchronous check for errors before initializing block pools.
Accordingly, checking for errors synchronously before initializing block pools
to exclude failed disks to avoid startup failures.
Please review.
> DN fails to startup if one of the data dir is full
> --------------------------------------------------
>
> Key: HDFS-5185
> URL: https://issues.apache.org/jira/browse/HDFS-5185
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: datanode
> Reporter: Vinayakumar B
> Assignee: Vinayakumar B
> Priority: Critical
> Attachments: HDFS-5185-002.patch, HDFS-5185.patch
>
>
> DataNode fails to startup if one of the data dirs configured is out of space.
> fails with following exception
> {noformat}2013-09-11 17:48:43,680 FATAL
> org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for
> block pool Block pool <registering> (storage id
> DS-308316523-xx.xx.xx.xx-64015-1378896293604) service to /nn1:65110
> java.io.IOException: Mkdirs failed to create
> /opt/nish/data/current/BP-123456-1234567/tmp
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.<init>(BlockPoolSlice.java:105)
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.addBlockPool(FsVolumeImpl.java:216)
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.addBlockPool(FsVolumeList.java:155)
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.addBlockPool(FsDatasetImpl.java:1593)
> at
> org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:834)
> at
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:311)
> at
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:217)
> at
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:660)
> at java.lang.Thread.run(Thread.java:662)
> {noformat}
> It should continue to start-up with other data dirs available.
--
This message was sent by Atlassian JIRA
(v6.2#6252)