They seem to not work fine when used in Reduce phase. I can post the stack trace if required.
-----Original Message----- From: Alejandro Abdelnur [mailto:[EMAIL PROTECTED] Sent: Monday, September 08, 2008 3:49 PM To: [email protected] Subject: Re: Multithreaded reduce OutputCollectors work fine when multithreaded, look at the MultiThreadMapRunner. On Mon, Sep 8, 2008 at 1:21 PM, Goel, Ankur <[EMAIL PROTECTED]> wrote: > Hi Folks, > > I have a setup where I am using a thread-pool > implementation (provided by java.util.concurrent package) in reduce > phase to do database I/O to speed up the database upload. The DB here is > MySQL. I decided to go for additional parallelism via threads as > > 1. It considerably speeds up the upload while consuming less cluster > resources (i.e. less number of reducers required). > > 2. The upload speed is not limited by the reduce task capacity of the > cluster but by the DB's ability to handle max connections simultaneously > and effectively. > > > > Each reduce task has 2 thread pools. One that does the DB I/O and whose > return a java.util.concurrent.FutureTask. Another pool that fetches > result from this future task to do disc I/O i.e. > outputCollector.collect(...). > > > > When multiple threads from the second pool try to do a disc I/O, I get > an AlreadyBeingCreatedException in the logs. If I set the second pool to > only have 1 thread then things work fine! > > > > It looks like the output collector was never assumed to be used from > multiple threads. > > > > Any thoughts on this? > > > > Thanks > > -Ankur > > > >
