Hi JP,

I don't actually know the answer to your question, but we do a lot of things 
using files and directories on HDFS and use renames to move files out of 
directories which are periodically scanned by other processes. All I can say: 
it has never gone wrong. We are happily living with the assumptions that the 
rename is atomic. Our directory scanning jobs runs every couple of seconds and 
has done so without any error for months.

Short answer: I don't know, but it appears to be that way (ignorance is a 
blessing).


Friso



On 25 aug 2010, at 02:21, Jean-Pierre OCALAN wrote:

Hi,

I would like to know if the rename operation (i.e. renaming a directory or a 
single file) can be consider as an atomic operation in HDFS.

Basically what i am trying to achieve is having one process that continiously 
add new file into the HDFS and another process that will start every 15 minutes 
a map/reduce flow on file that were newly added into the HDFS.

In other words a process A continuously read a local directory "A/in" where new 
files are moved there continuously and put each file in a "A/tmp" directory on 
the HDFS. When A finish to put one file in "A/tmp" it will move/rename that 
file into a "B/in" directory. At the same time a process B will, every 15 
minutes, push all the files present in "B/in" to a map/reduce flow.

Regards,

-- JP

Reply via email to