Re: Debugging Partitioner problems

Amogh Vasekar Wed, 20 Jan 2010 04:25:29 -0800

>>Can I tell hadoop to save the map outputs per reducer to be able to inspect 
>>what's in them
You can set keep.tasks.files.pattern will save mapper output, set this regex to 
match your job/task as need be. But this will eat up a lot of local disk space.


The problem most likely is your data ( or more specifically map output data ) 
being skewed, hence most keys hash to same partition id, and hence to one 
reducer. Are you implementing a join? If not, writing a custom partitioner 
would help.

Amogh

On 1/20/10 5:33 PM, "Erik Forsberg" <[email protected]> wrote:

Hi!

I have a problem with one of my reducers getting 3 times as much
data as the other 15 reducers, causing longer total runtime per job.

What would be the best way to debug this? I'm guessing I'm outputting
keys that somehow fool the partitioner. Can I tell hadoop to save the
map outputs per reducer to be able to inspect what's in them?

Thanks,
\EF
--
Erik Forsberg <[email protected]>
Developer, Opera Software - http://www.opera.com/

Re: Debugging Partitioner problems

Reply via email to