Re: Using Hadoop in a production environment

Eric Baldeschwieler Wed, 14 Jun 2006 10:17:29 -0700

sounds pretty cool.

On Jun 14, 2006, at 9:23 AM, Runping Qi wrote:

I have also been thinking of the Hadoop job scheduling issue too.

In my applications, some jobs depend on the outputs of other jobs.
Therefore, job dependency forms a DAG. A job is ready to run if andonly ifit does not have any dependency or all the jobs it depends arefinishedsuccessfully. To help schedule and monitor a group of jobs likethat, I am
thinking of implementing a utility class that:
        - accept jobs with dependency specification
      - monitor job status
      - submit jobs when they are ready
With such a utility class, the application can construct its jobs,specifytheir dependency and then hand the jobs to the utility class. Theutility
class takes care of the details of job submission.

Runping
-----Original Message-----
From: Paul Sutter [mailto:[EMAIL PROTECTED]
Sent: Tuesday, June 13, 2006 5:51 PM
To: [email protected]
Subject: Using Hadoop in a production environment

We are starting to string together our disparate Hadoop jobs into a
running
system, and we have a couple issues that are coming up.

I'm looking for feedback or suggestions on how we can solve them.

(1) Scheduling Hadoop jobs
Could an Ant extension be developed to let a complex set of Hadoopjobs becontrolled using a sort of a build script that decides which jobsneed to
be
run?

(2) How do we make Hadoop jobs atomic?

One issue we have is that a failing job can leave directories in an
inconsistent format, making a mess for the other jobs.
I'm thiking of an atomic operation we could submit to thenameserver. Itmight consist of multiple directory deletions and directoryrenames, andwould either complete in entirety, or not at all. And in this way,we'd
get
the equivalent of a begin/commit/rollback capability for one simple
function.

Im curious to hear others thoughts on this topic.

Re: Using Hadoop in a production environment

Reply via email to