Hi,

I have an HDFS folder and M/R job that periodically updates it by replacing the 
data with newly generated data.

I have a different M/R job that periodically or ad-hoc process the data in the 
folder.

The second job ,naturally, fails sometime, when the data is replaced by newly 
generated data and the job plan including the input paths have already been 
submitted.

Is there an elegant solution ?

My current though is to query the jobtracker for running jobs and go over all 
the input files, in the job XML to know if The swap should block until the 
input path is no longer in any current executed input path job.



 
                                          

Reply via email to