The current implementation of the MultithreadedMapRunner uses an
ArrayBlockingQueue variant, with a capacity = number of threads.  This
has the effect of "fooling" the map runner instrumentation into thinking
that the map task is further along than it is, since at any point, there
is one map thread task running, and another one queued up, waiting to
run, in the queue.

Substituting a similarly modified version of the SynchronousQueue keeps
the mapper from reading one ahead.  (by similarly modified, I am
referring to the changes to make the queue block instead of throwing
exceptions).

I concede that the work of reading the next map data, with this change,
is not overlapped with the work of processing it.  However, if that is
an issue, the number of threads can be increased.

We are using a modified version of the MultithreadedMapRunner with this
modification to run on an 8 way CPU, using 8-9 threads, and are getting
extremely good CPU utilization. 

-Marshall Schor

Reply via email to