Re: Chaining Multiple Map reduce jobs.

Nathan Marz Wed, 08 Apr 2009 15:31:17 -0700

You can also try decreasing the replication factor for theintermediate files between jobs. This will make writing those filesfaster.


On Apr 8, 2009, at 3:14 PM, Lukáš Vlček wrote:

Hi,
by far I am not an Hadoop expert but I think you can not start Maptask
until the previous Reduce is finished. Saying this it means that you
probably have to store the Map output to the disk first (because a]it maynot fit into memory and b] you would risk data loss if the systemcrashes).
As for the job chaining you can check JobControl class (
http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/jobcontrol/JobControl.html)<http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/jobcontrol/JobControl.html>
Also you can look at https://issues.apache.org/jira/browse/HADOOP-3702

Regards,
Lukas

On Wed, Apr 8, 2009 at 11:30 PM, asif md <[email protected]> wrote:
hi everyone,
i have to chain multiple map reduce jobs < actually 2 to 4 jobs >,each ofthe jobs depends on the o/p of preceding job. In the reducer ofeach job
I'm
doing very little < just grouping by key from the maps>. I want togive theoutput of one MapReduce job to the next job without having to go tothe
disk. Does anyone have any ideas on how to do this?

Thanx.
--
http://blog.lukas-vlcek.com/

Re: Chaining Multiple Map reduce jobs.

Reply via email to