Re: missing VERSION files leading to failed datanodes

Ted Dunning Tue, 08 Jan 2008 10:56:56 -0800

Can you put this on the wiki or as a comment on the jira?  This could be (as
you just noticed) a life-saver.



On 1/8/08 10:48 AM, "Joydeep Sen Sarma" <[EMAIL PROTECTED]> wrote:

> never mind. the storageID is logged in the namenode logs. i am able to restore
> the version files and add the datanodes back.
> 
> phew.
> 
> -----Original Message-----
> From: Joydeep Sen Sarma [mailto:[EMAIL PROTECTED]
> Sent: Tue 1/8/2008 10:11 AM
> To: hadoop-user@lucene.apache.org; hadoop-user@lucene.apache.org
> Subject: RE: missing VERSION files leading to failed datanodes
>  
> we are running 0.14.4
> 
> the fix won't help me recover the current version files. all i need is the
> storageid. it seems to be stored in some file header somewhere. can u tell me
> how to get it?
> 
> 
> -----Original Message-----
> From: dhruba Borthakur [mailto:[EMAIL PROTECTED]
> Sent: Tue 1/8/2008 10:06 AM
> To: hadoop-user@lucene.apache.org
> Subject: RE: missing VERSION files leading to failed datanodes
>  
> Hi Joydeep,
> 
> Which version of hadoop are you running? We had earlier fixed a bug
> https://issues.apache.org/jira/browse/HADOOP-2073
> in version 0.15.
> 
> Thanks,
> dhruba
> 
> -----Original Message-----
> From: Joydeep Sen Sarma [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, January 08, 2008 9:34 AM
> To: hadoop-user@lucene.apache.org; hadoop-user@lucene.apache.org
> Subject: RE: missing VERSION files leading to failed datanodes
> 
> well - at least i know why this happened. (still looking for a way to
> restore the version file).
> 
> https://issues.apache.org/jira/browse/HADOOP-2549 is causing disk full
> on one of the disks (in spite of du.reserved setting). looks like while
> starting up - the VERSION file could not be written and went missing.
> that would seem like another bug (writing a tmp file and renaming it to
> VERSION file would have prevented this mishap):
> 
> 2008-01-08 08:24:01,597 ERROR org.apache.hadoop.dfs.DataNode:
> java.io.IOException: No space left on device
>         at java.io.FileOutputStream.writeBytes(Native Method)
>         at java.io.FileOutputStream.write(FileOutputStream.java:260)
>         at
> sun.nio.cs.StreamEncoder$CharsetSE.writeBytes(StreamEncoder.java:336)
>         at
> sun.nio.cs.StreamEncoder$CharsetSE.implFlushBuffer(StreamEncoder.java:40
> 4)
>         at
> sun.nio.cs.StreamEncoder$CharsetSE.implFlush(StreamEncoder.java:408)
>         at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:152)
>         at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:213)
>         at java.io.BufferedWriter.flush(BufferedWriter.java:236)
>         at java.util.Properties.store(Properties.java:666)
>         at
> org.apache.hadoop.dfs.Storage$StorageDirectory.write(Storage.java:176)
>         at
> org.apache.hadoop.dfs.Storage$StorageDirectory.write(Storage.java:164)
>         at org.apache.hadoop.dfs.Storage.writeAll(Storage.java:510)
>         at
> org.apache.hadoop.dfs.DataStorage.recoverTransitionRead(DataStorage.java
> :146)
>         at
> org.apache.hadoop.dfs.DataNode.startDataNode(DataNode.java:243)
> 
> 
> -----Original Message-----
> From: Joydeep Sen Sarma [mailto:[EMAIL PROTECTED]
> Sent: Tue 1/8/2008 8:51 AM
> To: hadoop-user@lucene.apache.org
> Subject: missing VERSION files leading to failed datanodes
>  
> 
> 2008-01-08 08:36:20,045 ERROR org.apache.hadoop.dfs.DataNode:
> org.apache.hadoop.dfs.InconsistentFSStateException: Directory
> /var/hadoop/tmp/dfs/data is in an inconsistent state: file VERSION is
> invalid.
> 
> [EMAIL PROTECTED] data]# ssh hadoop003.sf2p cat
> /var/hadoop/tmp/dfs/data/current/VERSION
> [EMAIL PROTECTED] data]#
> 
> any idea why the VERSION file is empty? and how can i regenerate it - or
> ask the system to generate a new one without discarding all the blocks?
> 
> 
> i had previously shutdown and started dfs once (to debug a different bug
> where it's not honoring du.reserved. more on that later).
> 
> help appreciated,
> 
> Joydeep
> 
> 
>

Re: missing VERSION files leading to failed datanodes

Reply via email to