AFAIK there is not multithreaded reducer runner.

You have to make sure that you create each output collector only once,
not having a race condition in the creation.

A

On Tue, Sep 9, 2008 at 3:23 PM, Goel, Ankur <[EMAIL PROTECTED]> wrote:
> Folks,
>      My implementation is a bit different. I am not using multithreaded
> reduce runner. Instead using thread-pools to do DB and HDFS I/O from
> each
> of my reduce tasks. To give you example from my setup, I have 3 reduce
> tasks each with a DB thread pool of size 70 threads. This is to ensure
> that I have a maximum of 200 threads hitting the DB doing inserts into
> multiple tables.
>
> Setup MySQL with large configuration and this really makes the inserts
> go at breakneck speeds.
>
> Now each of the threads returns a result that I want to collect on HDFS
> so I tried collecting the result via outputCollector from these threads
> which gave me the same exception. I also tried synchronizing the
> ouputCollector which did not help.
>
> So then I decided to use a separate thread pool in each reduce task for
> doing output collection via outputCollector. When this pool was set to
> have only 1 thread, the exception did not occur. Setting it to 5 threads
> or more caused the exception to show up.
>
> I'll post the stack trace after reproducing the problem.
>
> Thanks
> -Ankur
>
> -----Original Message-----
> From: Alejandro Abdelnur [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, September 09, 2008 9:15 AM
> To: [email protected]
> Subject: Re: Multithreaded reduce
>
> Collectors are already properly synchronized. Maybe there is a race
> condition in the way the multithreaded reducer runner creates them.
>
> A
>
> On Tue, Sep 9, 2008 at 8:56 AM, Owen O'Malley <[EMAIL PROTECTED]>
> wrote:
>>
>> On Sep 8, 2008, at 4:12 AM, Goel, Ankur wrote:
>>
>>> They seem to not work fine when used in Reduce phase.
>>> I can post the stack trace if required.
>>
>> I believe it. I don't think I've ever seen anyone do a multi-threaded
>> reduce. Of course the answer is easy, just add synchronization around
> the
>> output collector before calling collect.
>>
>> -- Owen
>>
>

Reply via email to