Re: missing VERSION files leading to failed datanodes

Konstantin Shvachko Tue, 08 Jan 2008 11:04:58 -0800

Tad,

There was a dicussion on that in HADOOP-2073.
You are right in general the moves should be atomic,
but in this particular case the in-place modification works well.
There is a comment in the code explaining this too,
but the code is in 0.15 not in 0.14.4


--Konstantin

Ted Dunning wrote:

Dhruba,

It looks from the discussion like the file was overwritten in place.

Is that good practice?  Normally the way that this sort of update is handled
is to write a temp file, move the live file to a backup, then move the temp
file to the live place.  Both moves are atomic so the worst case is that you
wind up with either a temp and a live file (ignore the temp file since it
may be incomplete) or a backup and a temp file (move temp to live since it
must be complete).


On 1/8/08 10:06 AM, "dhruba Borthakur" <[EMAIL PROTECTED]> wrote:

Hi Joydeep,

Which version of hadoop are you running? We had earlier fixed a bug
https://issues.apache.org/jira/browse/HADOOP-2073
in version 0.15.

Thanks,
dhruba

-----Original Message-----
From: Joydeep Sen Sarma [mailto:[EMAIL PROTECTED]
Sent: Tuesday, January 08, 2008 9:34 AM
To: hadoop-user@lucene.apache.org; hadoop-user@lucene.apache.org
Subject: RE: missing VERSION files leading to failed datanodes

well - at least i know why this happened. (still looking for a way to
restore the version file).

https://issues.apache.org/jira/browse/HADOOP-2549 is causing disk full
on one of the disks (in spite of du.reserved setting). looks like while
starting up - the VERSION file could not be written and went missing.
that would seem like another bug (writing a tmp file and renaming it to
VERSION file would have prevented this mishap):

2008-01-08 08:24:01,597 ERROR org.apache.hadoop.dfs.DataNode:
java.io.IOException: No space left on device
       at java.io.FileOutputStream.writeBytes(Native Method)
       at java.io.FileOutputStream.write(FileOutputStream.java:260)
       at
sun.nio.cs.StreamEncoder$CharsetSE.writeBytes(StreamEncoder.java:336)
       at
sun.nio.cs.StreamEncoder$CharsetSE.implFlushBuffer(StreamEncoder.java:40
4)
       at
sun.nio.cs.StreamEncoder$CharsetSE.implFlush(StreamEncoder.java:408)
       at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:152)
       at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:213)
       at java.io.BufferedWriter.flush(BufferedWriter.java:236)
       at java.util.Properties.store(Properties.java:666)
       at
org.apache.hadoop.dfs.Storage$StorageDirectory.write(Storage.java:176)
       at
org.apache.hadoop.dfs.Storage$StorageDirectory.write(Storage.java:164)
       at org.apache.hadoop.dfs.Storage.writeAll(Storage.java:510)
       at
org.apache.hadoop.dfs.DataStorage.recoverTransitionRead(DataStorage.java
:146)
       at
org.apache.hadoop.dfs.DataNode.startDataNode(DataNode.java:243)


-----Original Message-----
From: Joydeep Sen Sarma [mailto:[EMAIL PROTECTED]
Sent: Tue 1/8/2008 8:51 AM
To: hadoop-user@lucene.apache.org
Subject: missing VERSION files leading to failed datanodes


2008-01-08 08:36:20,045 ERROR org.apache.hadoop.dfs.DataNode:
org.apache.hadoop.dfs.InconsistentFSStateException: Directory
/var/hadoop/tmp/dfs/data is in an inconsistent state: file VERSION is
invalid.

[EMAIL PROTECTED] data]# ssh hadoop003.sf2p cat
/var/hadoop/tmp/dfs/data/current/VERSION
[EMAIL PROTECTED] data]#

any idea why the VERSION file is empty? and how can i regenerate it - or
ask the system to generate a new one without discarding all the blocks?


i had previously shutdown and started dfs once (to debug a different bug
where it's not honoring du.reserved. more on that later).

help appreciated,

Joydeep

Re: missing VERSION files leading to failed datanodes

Reply via email to