HDFS moves and MapReduce jobs

Edward Capriolo Fri, 09 Jul 2010 14:43:39 -0700

This is a question I should go and test out myself but was wondering
if anyone has a quick answer.


We have map/reduce jobs that produce lots of smaller files to a folder.
We also have a hive external table pointed at this folder.
We have a tool FileCrusher which is made to bunch up multiple small
files TEXT,and SEQUENCE into 1 large file. (which we are going to open
source to help people with lots of file problems)

It is launched something like this FileCrusher /src/folder.
This process builds one large file in a temp directory, then once done
moves the old files to a junk folder and moves the new file into the
/src/folder

What I am looking to figure out is, if a map reduce job is started
before the files are moved, the splits are calculated and the job is
running, what will happen if I then move the files in /src/folder and
replace with a new file.

I am hoping that since the splits are associated with blocks that the
Job will produce correct results no matter what time the files are
moved. In other works after split calculate the job should be
"atomic".

Regards,
Edward

HDFS moves and MapReduce jobs

Reply via email to