How about introducing a distributed coordination and locking mechanism? ZooKeeper would be a good candidate for that kind of thing.
On Mon, Aug 13, 2012 at 12:52 PM, David Ginzburg <ginz...@hotmail.com>wrote: > Hi, > > I have an HDFS folder and M/R job that periodically updates it by > replacing the data with newly generated data. > > I have a different M/R job that periodically or ad-hoc process the data in > the folder. > > The second job ,naturally, fails sometime, when the data is replaced by > newly generated data and the job plan including the input paths have > already been submitted. > > Is there an elegant solution ? > > My current though is to query the jobtracker for running jobs and go over > all the input files, in the job XML to know if The swap should block until > the input path is no longer in any current executed input path job. > > > > >