Hi JP, I don't actually know the answer to your question, but we do a lot of things using files and directories on HDFS and use renames to move files out of directories which are periodically scanned by other processes. All I can say: it has never gone wrong. We are happily living with the assumptions that the rename is atomic. Our directory scanning jobs runs every couple of seconds and has done so without any error for months.
Short answer: I don't know, but it appears to be that way (ignorance is a blessing). Friso On 25 aug 2010, at 02:21, Jean-Pierre OCALAN wrote: Hi, I would like to know if the rename operation (i.e. renaming a directory or a single file) can be consider as an atomic operation in HDFS. Basically what i am trying to achieve is having one process that continiously add new file into the HDFS and another process that will start every 15 minutes a map/reduce flow on file that were newly added into the HDFS. In other words a process A continuously read a local directory "A/in" where new files are moved there continuously and put each file in a "A/tmp" directory on the HDFS. When A finish to put one file in "A/tmp" it will move/rename that file into a "B/in" directory. At the same time a process B will, every 15 minutes, push all the files present in "B/in" to a map/reduce flow. Regards, -- JP