Hi, I have an HDFS folder and M/R job that periodically updates it by replacing the data with newly generated data.
I have a different M/R job that periodically or ad-hoc process the data in the folder. The second job ,naturally, fails sometime, when the data is replaced by newly generated data and the job plan including the input paths have already been submitted. Is there an elegant solution ? My current though is to query the jobtracker for running jobs and go over all the input files, in the job XML to know if The swap should block until the input path is no longer in any current executed input path job.