What kind of atomicity/visibility claims are made regarding the various
operations on a FileSystem?
I have multiple processes that write into local sequence files, then uploads
them into a remote directory in HDFS. A map/reduce job runs which operates
on whatever is in the directory. The processes are not synchronized with the
job, so it is entirely possible that the job might start as a file is being
uploaded. Thus, my concern is that the job may include a partially uploaded
file if "FileSystem.copyFromLocalFile" is not atomic (in the sense that the
file will not appear until all bytes are written).

Are any of the FileSystem API's atomic in this sense? What about, at the
very least, rename (e.g. first write to a temp hdfs location, then use
rename to atomically flip the file into the live directory)?

Thanks,
Brian

Reply via email to