Hadoop overhead

Johan Oskarsson Tue, 15 Jan 2008 09:16:53 -0800

Hi.

I believe someone posted about this a while back, but it's worthmentioning again.


I just ran a job on our 10 node cluster where the input data was

~70 empty sequence files, with our default settings this ran about ~200mappers and ~70 reducers.


The job took almost exactly two minutes to finish.

How can we reduce this overhead?

* Pick number of mappers and reducers in a more dynamic way,
  depending on the size of the input?
* JVM reuse, one jvm per job instead of one per task?

Any other ideas?

/Johan

Hadoop overhead

Reply via email to