[ 
https://issues.apache.org/jira/browse/HDFS-1887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13029357#comment-13029357
 ] 

sravankorumilli commented on HDFS-1887:
---------------------------------------

Solution:-I have fixed this in the corresponding way by catching the 
EOFException in the method DataStorage.isConversionNeeded and deleting the file 
and returning false.
Then the data node will be started successfully and there wont be any data loss 
also.I have tested this this looks fine for me.I can provide the patch or am I 
missing any point anywhere?

One More Scenario:-This problem will also come under normal data node restarts 
and if the storage file is not present then it will try to recreate the file, 
so before writing the the LAYOUT_VERSION if data node is killed then further 
restarts will be failing in the similar fashion.

> If DataNode gets killed after 'data.dir' is created, but before LAYOUTVERSION 
> is written to the storage file. The further restarts of the DataNode, an 
> EOFException will be thrown while reading the storage file. 
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-1887
>                 URL: https://issues.apache.org/jira/browse/HDFS-1887
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: data-node
>    Affects Versions: 0.20.1, 0.21.0, 0.23.0
>         Environment: Linux
>            Reporter: sravankorumilli
>            Priority: Minor
>
> Assume DataNode gets killed after 'data.dir' is created, but before 
> LAYOUTVERSION is written to the storage file. The further restarts of the 
> DataNode, an EOFException will be thrown while reading the storage file. The 
> DataNode cannot be restarted successfully until the 'data.dir' is deleted.
> These are the corresponding logs:-
> 2011-05-02 19:12:19,389 ERROR 
> org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.EOFException
> at java.io.RandomAccessFile.readInt(RandomAccessFile.java:725)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataStorage.isConversionNeeded(DataStorage.java:203)
> at 
> org.apache.hadoop.hdfs.server.common.Storage.checkConversionNeeded(Storage.java:697)
> at org.apache.hadoop.hdfs.server.common.Storage.access$000(Storage.java:62)
> at 
> org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.analyzeStorage(Storage.java:476)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:116)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:336)
> at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:260)
> at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:237)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1440)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1393)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1407)
> at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1552)
> Our Hadoop cluster is managed by a cluster management software which tries to 
> eliminate any manual intervention in setting up & managing the cluster. But 
> in the above mentioned scenario, it requires manual intervention to recover 
> the DataNode.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to