How about introducing a distributed coordination and locking mechanism?
ZooKeeper would be a good candidate for that kind of thing.



On Mon, Aug 13, 2012 at 12:52 PM, David Ginzburg <ginz...@hotmail.com>wrote:

> Hi,
>
> I have an HDFS folder and M/R job that periodically updates it by
> replacing the data with newly generated data.
>
> I have a different M/R job that periodically or ad-hoc process the data in
> the folder.
>
> The second job ,naturally, fails sometime, when the data is replaced by
> newly generated data and the job plan including the input paths have
> already been submitted.
>
> Is there an elegant solution ?
>
> My current though is to query the jobtracker for running jobs and go over
> all the input files, in the job XML to know if The swap should block until
> the input path is no longer in any current executed input path job.
>
>
>
>
>

Reply via email to