On Wed, Jun 9, 2010 at 9:55 PM, wd <[email protected]> wrote:

> I have lots of small files in hive, the mapred is too slow .... Is there a
> way to improve the speed ?
>
> 2010/6/10 Edward Capriolo <[email protected]>
>
>
>>
>> On Wed, Jun 9, 2010 at 3:04 AM, wd <[email protected]> wrote:
>>
>>> I've tried hive 0.5, the option not work too.
>>> And find this page[
>>> http://markmail.org/message/k32nrcb2ncsq67ef?q=mapred.map.tasks+#query:mapred.map.tasks%20+page:1+mid:k32nrcb2ncsq67ef+state:results]
>>> via google.
>>>
>>> 2010/6/9 wd <[email protected]>
>>>
>>> hi,
>>>>
>>>> I'm using hive svn rev946854. And try to set mapred.map.tasks=1 at hive
>>>> cli, but seemes it doesn't work, total map tasks still over 300+.
>>>>
>>>> Is this a svn version problem?
>>>>
>>>
>>>
>> You answered your own question, look in the link
>>
>> "You cannot force *mapred.map.tasks* but can specify mapred.reduce.tasks.
>> "
>>
>> Map tasks is based on the number of input files and folders. Even though
>> hive uses a CombinedInput format you still can get a number of mappers.
>>
>> Edward
>>
>
> With hadoop 20 and the Combine InputFormat you should get fairly decent
performance even with many small files. My current employer is about to open
source FileCrusher, a stand alone and map reduce application that merges
Text and Sequence files into one big one. So if you hang tight for a couple
days a can point you at a utility that might help.

Reply via email to