use CombineHiveInputFormat
check your hive.input.format ________________________________________ From: Alex Kozlov [[email protected]] Sent: Wednesday, June 09, 2010 9:15 PM To: [email protected] Subject: Re: set mapred.map.tasks=1 not work Hi Wd, Try: hive.merge.mapfiles=true hive.merge.size.per.task=1000000 (or some other large number) Alex K On Wed, Jun 9, 2010 at 6:55 PM, wd <[email protected]<mailto:[email protected]>> wrote: I have lots of small files in hive, the mapred is too slow .... Is there a way to improve the speed ? 2010/6/10 Edward Capriolo <[email protected]<mailto:[email protected]>> On Wed, Jun 9, 2010 at 3:04 AM, wd <[email protected]<mailto:[email protected]>> wrote: I've tried hive 0.5, the option not work too. And find this page[http://markmail.org/message/k32nrcb2ncsq67ef?q=mapred.map.tasks+#query:mapred.map.tasks%20+page:1+mid:k32nrcb2ncsq67ef+state:results] via google. 2010/6/9 wd <[email protected]<mailto:[email protected]>> hi, I'm using hive svn rev946854. And try to set mapred.map.tasks=1 at hive cli, but seemes it doesn't work, total map tasks still over 300+. Is this a svn version problem? You answered your own question, look in the link "You cannot force mapred.map.tasks but can specify mapred.reduce.tasks. " Map tasks is based on the number of input files and folders. Even though hive uses a CombinedInput format you still can get a number of mappers. Edward
