Re: Number of Mappers

Josh Ferguson Sun, 11 Jan 2009 23:07:03 -0800

The reducer method is a pretty low-cost (in terms of developer time)workaround, I wouldn't make it too high of a priority. It seems like athroughput optimization at most and that's only for a certain class ofmapper script that actually reduces the input set in some way.


Josh


On Jan 11, 2009, at 10:59 PM, Joydeep Sen Sarma wrote:

We should be able to control this (specify exact mapper count) oncehadoop-4565 and hive-74 are resolved (these are being worked onactively).
From: Zheng Shao [mailto:zsh...@gmail.com]
Sent: Sunday, January 11, 2009 9:16 PM
To: hive-user@hadoop.apache.org
Subject: Re: Number of Mappers

Currently the only way to do it is to use a reducer.

set mapred.reduce.tasks=1;
SELECT TRANSFORM(actor_id) USING '/my/script' AS (actor_id,percentile, count) FROM (SELECT actor_id FROM activities CLUSTER BYactor_id) a;On Sun, Jan 11, 2009 at 8:45 PM, Josh Ferguson <j...@besquared.net>wrote:
If I'm running a query like this:
hive> SELECT TRANSFORM(actor_id) USING '/my/script' AS (actor_id,percentile, count) FROM activities;
It creates a map job for each file. I need every row that is in thetable to be run through a single instance of the script sincecertain parts require global list information. Do I need to reworkthis query to use a reducer or can I change some configurationvariable to load in all of my data from this table and run itthrough /my/script all at once?
Josh F.



--
Yours,
Zheng

Re: Number of Mappers

Reply via email to