Hi Wd,

Try:

*hive.merge.mapfiles*=true
*hive.merge.size.per.task*=1000000 (or some other large number)

Alex K


On Wed, Jun 9, 2010 at 6:55 PM, wd <[email protected]> wrote:

> I have lots of small files in hive, the mapred is too slow .... Is there a
> way to improve the speed ?
>
> 2010/6/10 Edward Capriolo <[email protected]>
>
>>
>>
>> On Wed, Jun 9, 2010 at 3:04 AM, wd <[email protected]> wrote:
>>
>>> I've tried hive 0.5, the option not work too.
>>> And find this page[
>>> http://markmail.org/message/k32nrcb2ncsq67ef?q=mapred.map.tasks+#query:mapred.map.tasks%20+page:1+mid:k32nrcb2ncsq67ef+state:results]
>>> via google.
>>>
>>> 2010/6/9 wd <[email protected]>
>>>
>>> hi,
>>>>
>>>> I'm using hive svn rev946854. And try to set mapred.map.tasks=1 at hive
>>>> cli, but seemes it doesn't work, total map tasks still over 300+.
>>>>
>>>> Is this a svn version problem?
>>>>
>>>
>>>
>> You answered your own question, look in the link
>>
>> "You cannot force *mapred.map.tasks* but can specify mapred.reduce.tasks.
>> "
>>
>> Map tasks is based on the number of input files and folders. Even though
>> hive uses a CombinedInput format you still can get a number of mappers.
>>
>> Edward
>>
>
>

Reply via email to