Re: Using Hadoop in a production environment

Eric Baldeschwieler Tue, 13 Jun 2006 20:23:02 -0700

doing a final rename of the target directory sounds like a goodsimple idea.


Could rename to something else if it failed.

The other approach is simply to drop completion stamps (create a ./COMPLETE file) when all is done.


On Jun 13, 2006, at 5:51 PM, Paul Sutter wrote:

We are starting to string together our disparate Hadoop jobs into arunning
system, and we have a couple issues that are coming up.

I'm looking for feedback or suggestions on how we can solve them.

(1) Scheduling Hadoop jobs
Could an Ant extension be developed to let a complex set of Hadoopjobs becontrolled using a sort of a build script that decides which jobsneed to be
run?

(2) How do we make Hadoop jobs atomic?

One issue we have is that a failing job can leave directories in an
inconsistent format, making a mess for the other jobs.
I'm thiking of an atomic operation we could submit to thenameserver. Itmight consist of multiple directory deletions and directoryrenames, andwould either complete in entirety, or not at all. And in this way,we'd get
the equivalent of a begin/commit/rollback capability for one simple
function.

Im curious to hear others thoughts on this topic.

Re: Using Hadoop in a production environment

Reply via email to