On Thu, Mar 3, 2011 at 2:04 PM, Keith Wiley <[email protected]> wrote:
> On Mar 3, 2011, at 2:51 AM, Steve Loughran wrote:
>
>> yes, but the problem is determining which one will fail. Ideally you should 
>> find the route cause, which is often some race condition or hardware fault. 
>> If it's the same server ever time, turn it off.
>
>> You can play with the specex parameters, maybe change when they get kicked 
>> off. The assumption in the code is that the slowness is caused by H/W 
>> problems (especially HDD issues) and it tries to avoid duplicate work. If 
>> every Map was duplicated, you'd be doubling the effective cost of each 
>> query, and annoying everyone else in the cluster. Plus increased disk and 
>> network IO might slow things down.
>>
>> Look at the options, have a play and see. If it doesn't have the feature, 
>> you can always try coding it in -if the scheduler API lets it do it, you 
>> wont' be breaking anyone else's code.
>>
>> -steve
>
>
> Thanks.  I'll take it under consideration.  In my case, it would be really 
> beneficial to duplicate the work.  That task in question is a single task on 
> a single node (numerous mappers feed data into a single reducer), so 
> duplicating the reducer represents very will duplicated effort while 
> mitigating a potential bottleneck in the job's performance since the job 
> simply is not done until the single reducer finishes.  I would really like to 
> be able to do what I am suggesting, to duplicate the reducer and kill the 
> clones after the winner finishes.
>
> Anyway, thanks.
>

What is your reason for needing a single reducer? I'd first try to see
how I could parallelize that work first if possible.

Jacob

Reply via email to