Re: Hadoop debugging

Owen O'Malley Mon, 17 Jul 2006 09:38:17 -0700


On Jul 17, 2006, at 1:36 AM, Thomas FRIOL wrote:

Hi all,
I am a new hadoop user and I am now writting my own map reduceoperations but it is hard for me to find out where comes from theproblem when the job fails.
So my question is : What is the best way to debug a map reduce job ?


Ok, I should probably put this onto a wiki page, but my short answer is:

1. Start by getting everything running (likely on a small input) in thelocal runner. You do this by setting yourjob tracker to "local" in your config. The local runner can run underthe debugger and is not distributed.

2. Run the small input on a 1 node cluster. This will smoke out all ofthe issues that happen with distribution and the "real" task runner,but you only have a single place to look at logs. Most useful are thetask and job tracker logs. Make sure you are logging at the INFO levelor you will miss clues like the output of your tasks.

3. Run on a big cluster. Recently, I added the keep.failed.task.filesconfig variable that tells the system to keep files for tasks thatfail. This leaves "dead" files around that you can debug with. On thenode with the failed task, go to the task tracker's local directory andcd to <local>/taskTracker/<taskid> and run

% hadoop org.apache.hadoop.IsolationRunner job.xml

This will run the failed task in a single jvm, which can be in thedebugger, over precisely the same input.

I also have a patch that will let you specify a task to keep, even ifit doesn't fail. Other than that, logging is your friend.

I don't have issues with my log messages getting through, so you mightcheck your filters. Exceptions are mostly handled right, but we'vefound and fixed spots where they weren't, so that is possible. Usuallyit involves someone throwing an unchecked exception like RuntimeErrorand the catch only catching checked exceptions.


-- Owen

Re: Hadoop debugging

Reply via email to