Re: [RT] map reduce "pipelines"

Owen O'Malley Wed, 09 Jun 2010 11:51:05 -0700

At Yahoo, we had a framework that was similar to MapReduce called
Dreadnaught. When we were converting applications off of Dreadnaught
to Hadoop MapReduce, we considered supporting M-R-R. (Dreadnaught
imposes few restrictions on the application and could support M, M-R,
M-R-R, etc.) The problem is that supporting the retry semantics
arbitrarily far back can cause a single node failure to launch more
and more work. By putting a checkpoint after each reduce (based on the
replica count in HDFS > 1), M-R has bounded amount of rework that can
be required and relatively simple error recovery. Hadoop is better off
doing a good job at supporting MapReduce than a bad job on more
complex pipelines.


For pipelines, I'd strongly suggest using Pig or Hive that do the
cross-job optimizations for you...

-- Owen

Re: [RT] map reduce "pipelines"

Reply via email to