Partially written SequenceFile

Hemant Bhanawat Thu, 04 Jul 2013 02:41:01 -0700

Hi,


I am working on 2.0.2-alpha version of Hadoop. I am currently writing my key 
value pairs on HDFS in a sequence file. I regularly flush my data using hsync() 
because the process that is writing to the file can terminate abruptly. My 
requirement is that once my hsync() is successful, my data that was written 
before hsync() should still be available. 

To ensure this, I carried out a test that killed the process (that was writing 
to a SequenceFile) after this process did a hsync(). Now when I read the data 
using "hadoop fs -cat" command, I can see the data. But the size of file is 0. 
Also, SequenceFile.Reader.next(key, value) returns me false. I read somewhere 
that since file was not closed properly its size was not updated with the 
namenode and because of the same reason next() returns false. 

To fix this and to enable reading of file using SequenceFile APIs, I opened the 
file stream in append mode and then I closed it immediately. This fixed the 
size of the file. While doing this, I retry if I receive RecoveryInProgress or 
AlreadyBeingCreated exception. Now, I can successfully read data using 
SequenceFile.Reader. Following is the code that I am using. 


*** WRITE THREAD *** 

writer = SequenceFile.createWriter(fs, conf, path, value.getClass(), 
value.getClass(), CompressionType.NONE); 
writer.append(new Text("India"), new Text("Delhi")); 
writer.append(new Text("China"), new Text("Beijing")); 
writer.hsync(); 
// BOOM, FAILURE, PROCESS TERMINATED 

*** I expect that India and China Should be available but next returns false*** 

*** Code to fix the file size **** 

while (true) { 
try { 
FileSystem fs = FileSystem.get(namenodeURI, conf); 
Path path = new Path( uri); 
FSDataOutputStream open = fs.append(path); 
fs.close(); 
break; 
} catch (Recovery In Progress Exception) { 
} catch (Already Being Created Exception) { 
} catch (Exception) { 
break; 
} 

} 


Would it be possible for you to let me know if this approach has any 
shortcomings or if there are any other better alternatives available? 

Thanks, 
Hemant Bhanawat

Partially written SequenceFile

Reply via email to