Hi.

I believe someone posted about this a while back, but it's worth mentioning again.

I just ran a job on our 10 node cluster where the input data was
~70 empty sequence files, with our default settings this ran about ~200 mappers and ~70 reducers.

The job took almost exactly two minutes to finish.

How can we reduce this overhead?

* Pick number of mappers and reducers in a more dynamic way,
  depending on the size of the input?
* JVM reuse, one jvm per job instead of one per task?

Any other ideas?

/Johan

Reply via email to