Re: How to speed up of Map/Reduce job?

Steve Loughran Tue, 01 Feb 2011 07:54:03 -0800

On 01/02/11 08:19, Igor Bubkin wrote:

Hello everybody


I have a problem. I installed Hadoop on 2-nodes cluster and run Wordcount
example. It takes about 20 sec for processing of 1,5MB text file. We want to
use Map/Reduce in real time (interactive: by user's requests). User can't
wait for his request 20 sec. This is too long. Is it possible to reduce time
of Map/Reduce job? Or may be I misunderstand something?

1. I'd expect a minimum 30s query time due to the way work gets queuedand dispatched, JVM startup costs etc. There is no way to eliminate thisin Hadoop's current architecture.

2. 1.5M is a very small file size; I'm currently recommending a blocksize of 512M in new clusters for various reasons. This size of data isjust too small to bother with distribution. Load it up into memory;analyse it locally. Things like Apache CouchDB also support MapReduce.

Hadoop is not designed for clusters of less than about 10 machines (notenough redundancy of storage), or for small datasets. If your problemsaren't big enough, use different tools, because Hadoop contains designdecisions and overheads that only make sense once your data is measuredin GB and your filesystem in tens to thousands of Terabytes.

Re: How to speed up of Map/Reduce job?

Reply via email to