use CombineHiveInputFormat

check your hive.input.format


________________________________________
From: Alex Kozlov [[email protected]]
Sent: Wednesday, June 09, 2010 9:15 PM
To: [email protected]
Subject: Re: set mapred.map.tasks=1 not work

Hi Wd,

Try:

hive.merge.mapfiles=true
hive.merge.size.per.task=1000000 (or some other large number)


Alex K

On Wed, Jun 9, 2010 at 6:55 PM, wd <[email protected]<mailto:[email protected]>> 
wrote:
I have lots of small files in hive, the mapred is too slow .... Is there a way 
to improve the speed ?

2010/6/10 Edward Capriolo <[email protected]<mailto:[email protected]>>


On Wed, Jun 9, 2010 at 3:04 AM, wd <[email protected]<mailto:[email protected]>> 
wrote:
I've tried hive 0.5, the option not work too.
And find this 
page[http://markmail.org/message/k32nrcb2ncsq67ef?q=mapred.map.tasks+#query:mapred.map.tasks%20+page:1+mid:k32nrcb2ncsq67ef+state:results]
 via google.

2010/6/9 wd <[email protected]<mailto:[email protected]>>

hi,

I'm using hive svn rev946854. And try to set mapred.map.tasks=1 at hive cli, 
but seemes it doesn't work, total map tasks still over 300+.

Is this a svn version problem?


You answered your own question, look in the link

"You cannot force mapred.map.tasks but can specify mapred.reduce.tasks. "

Map tasks is based on the number of input files and folders. Even though hive 
uses a CombinedInput format you still can get a number of mappers.

Edward


Reply via email to