Why so many mappers and reducers relative to the number of machines you have? This just causes excess heartache when running the job.
My standard practice is to run with a small factor larger than the number of cores that I have (for instance 3 tasks on a 2 core machine). In fact, I find it most helpful to have the cluster defaults rule the choice except in a few cases where I want one reducer or a few more than the standard 4 reducers. On 1/15/08 9:15 AM, "Johan Oskarsson" <[EMAIL PROTECTED]> wrote: > Hi. > > I believe someone posted about this a while back, but it's worth > mentioning again. > > I just ran a job on our 10 node cluster where the input data was > ~70 empty sequence files, with our default settings this ran about ~200 > mappers and ~70 reducers. > > The job took almost exactly two minutes to finish. > > How can we reduce this overhead? > > * Pick number of mappers and reducers in a more dynamic way, > depending on the size of the input? > * JVM reuse, one jvm per job instead of one per task? > > Any other ideas? > > /Johan