Re: Teragen defaults to 2 maps; terasort defaults to 1 reducer

Gross, Danny Mon, 29 Jun 2009 16:39:46 -0700

Hi Arun,

Perfect.  Thanks for the help.


Best regards,

Danny

----- Original Message -----
From: Arun C Murthy <a...@yahoo-inc.com>
To: common-user@hadoop.apache.org <common-user@hadoop.apache.org>
Cc: core-u...@hadoop.apache.org <core-u...@hadoop.apache.org>
Sent: Mon Jun 29 14:59:54 2009
Subject: Re: Teragen defaults to 2 maps; terasort defaults to 1 reducer

These are due to the default #maps/#reduces in Map-Reduce.

Use:
$ bin/hadoop jar hadoop-*-dev-examples.jar teragen - 
Dmapred.map.tasks=8000 10000000000 /tera/in
$ bin/hadoop jar hadoop-*-dev-examples.jar terasort - 
Dmapred.reduce.tasks=5300 /tera/in /tera/out

Arun

On Jun 29, 2009, at 2:03 PM, Gross, Danny wrote:

> Hello all,
>
>
>
> I'm trying to run the hadoop-1.19.1-examples.jar teragen and terasort
> programs on a cluster.  I have two problems with these programs:
>
>
>
> 1.    The data is generated in a fashion to where it is not balanced
> across my cluster.  This is because the data is generated with 2 maps.
>
>       *       With the command "hadoop jar hadoop-0.19.1-examples.jar
> teragen 1000000000 /terasort"  (or any size) per the example doc, I  
> get
> 2 maps.  With replication set to 2, this tends to place data more
> heavily on 2 of my nodes, and the cluster believes it is balanced.
>
>
>
> 2.    The terasort program runs out of disk space on the reduce
> operation.  This is because the program runs with a single reduce  
> task.
>
>
>       *       When running "hadoop jar hadoop-0.19.1-examples.jar
> terasort /terasort /out" per the example doc, I get the appropriate
> number of maps, but one reduce.  I've scoured the web and the new  
> Hadoop
> book, and I'm just not able to change the number of reducers.  An
> example attempt was with the command "hadoop jar
> -Dmapred.reduce.tasks=16 hadoop-0.19.1-examples.jar terasort /terasort
> /out".
>
>
>
> Could anyone help shed some light on how to modify the execution of
> these programs to more appropriately balance the data, and spread the
> reduce load out across my cluster?
>
>
>
> Best regards,
>
>
>
> Danny Gross
>
>
>

Re: Teragen defaults to 2 maps; terasort defaults to 1 reducer

Reply via email to