I'm using fair scheduler and JVM reuse. It is just plain a big job. I'm not using a combiner right now, but that's something to look at.
What about bumping the mapred.reduce.tasks up to some huge number? I think that shouldn't make a difference, but I'm hearing conflicting information on this.
