[ 
https://issues.apache.org/jira/browse/HDFS-1887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13039558#comment-13039558
 ] 

sravankorumilli commented on HDFS-1887:
---------------------------------------

I want to propose this solution:-

In order to avoid manual intervention of deleting the storage directory we can 
check for a file say "formatrequired" file if the file is present we will 
format the storage directory
The file will be created while formatting a storage directory is started and 
deleted once the version file is written successfully. This will ensure 
formatting has been done successfully. Whenever datanode is started the first 
step of the analyzing the storage directory is to check the "formatrequired" 
file is present or not if the file is present then we will set the start up 
option to format then the storage directory will be formatted and the datanode 
will be started successfully.

> Facing problems while restarting the datanode if the datanode format is 
> unsuccessful.
> -------------------------------------------------------------------------------------
>
>                 Key: HDFS-1887
>                 URL: https://issues.apache.org/jira/browse/HDFS-1887
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: data-node
>    Affects Versions: 0.20.1, 0.21.0, 0.23.0
>         Environment: Linux
>            Reporter: sravankorumilli
>
> In the existing behavior we are checking whether datanode is formatted or not 
> based on the version file existence. If version file is not present the 
> storage directory will be formatted. In some cases if formatting got 
> terminated abruptly there can be a scenario where storage file or version 
> file will be created and the content may not be written. In such scenarios 
> when Datanode is restarted it is just throwing an exception. Some one has to 
> manually delete the storage directory and restart the datanode.
> This is one scenario where storage file is created but the content is not 
> written then I am getting this exception.
> These are the corresponding logs:-
> 2011-05-02 19:12:19,389 ERROR 
> org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.EOFException
> at java.io.RandomAccessFile.readInt(RandomAccessFile.java:725)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataStorage.isConversionNeeded(DataStorage.java:203)
> at 
> org.apache.hadoop.hdfs.server.common.Storage.checkConversionNeeded(Storage.java:697)
> at org.apache.hadoop.hdfs.server.common.Storage.access$000(Storage.java:62)
> at 
> org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.analyzeStorage(Storage.java:476)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:116)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:336)
> at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:260)
> at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:237)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1440)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1393)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1407)
> at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1552)
> Our Hadoop cluster is managed by a cluster management software which tries to 
> eliminate any manual intervention in setting up & managing the cluster. But 
> in the above mentioned scenario, it requires manual intervention to recover 
> the DataNode.Though it is very rare there is a possibility for this

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to