You should try implementing some suggestions from this blog post

http://www.cloudera.com/blog/2009/02/the-small-files-problem/

In general just google for tuning map/reduce programs and you will see some
good articles like these

http://www.docstoc.com/docs/3766688/Hadoop-Map-Reduce-Tuning-and-Debugging-A
run-C-Murthy-acmurthy






> From: Jander <[email protected]>
> Date: Tue, 5 Oct 2010 16:43:49 +0800 (CST)
> To: <[email protected]>
> Subject: Re:Re: Help!!The problem about Hadoop
> 
> Hi Jeff,

Thank you very much for your reply sincerely.

I exactly know hadoop
> has overhead, but is it too large in my problem?

The 1GB text input has about
> 500 map tasks because the input is composed of little text file. And the time
> each map taken is from 8 seconds to 20 seconds. I use compression like
> conf.setCompressMapOutput(true).

Thanks,
Jander






At 2010-10-05 > 16:28:55,
"Jeff Zhang" <[email protected]> wrote:



>Hi Jander,


>
>Hadoop has > overhead compared to single-machine solution. How many
task
>have you get when
> you run your hadoop job ? And what is time consuming
>for each map and reduce
> task ?
>
>There's lots of tips for performance tuning of hadoop. Such
> as
>compression and jvm reuse.


>
>
>2010/10/5 Jander <[email protected]>:
>>
> Hi, all

>> I do an application using hadoop.

>> I take 1GB text data as input
> the result as follows:
>>    (1) the cluster of 3 PCs: the time consumed is
> 1020 seconds.
>>    (2) the cluster of 4 PCs: the time is about 680
> seconds.
>> But the application before I use Hadoop takes about 280 seconds,
> so as the speed above, I must use 8 PCs in order to have the same speed as
> before. Now the problem: whether it is correct?
>>
>> Jander,
>>
> Thanks.

>>
>>
>>
>
>
>
>-- 
>Best Regards
>
>Jeff Zhang


iCrossing Privileged and Confidential Information
This email message is for the sole use of the intended recipient(s) and may 
contain confidential and privileged information of iCrossing. Any unauthorized 
review, use, disclosure or distribution is prohibited. If you are not the 
intended recipient, please contact the sender by reply email and destroy all 
copies of the original message.


Reply via email to