On Thu, 15 Sep 2011 12:51:48 -0700, Frank Astier <[email protected]>
wrote:
> I’m using IntelliJ and the WordCount example in Hadoop (which uses
> MiniMRCluster). Is it possible to set an IntelliJ debugger breakpoint
> straight into the map function of the mapper? - I’ve tried, but so far,
the
> debugger does not stop at the breakpoint.
The problem is that the mapper is being run in a different JVM than the
one you launch. Here's what I do (using IntelliJ on cluster nodes running
Ubuntu):
add the following lines to your configuration.xml file:
<property>
<name>mapred.map.child.java.opts</name>
<value>-Xdebug
-Xrunjdwp:transport=dt_socket,server=y,suspend=y</value>
</property>
This adds the JPDA debugging listener to your mappers' JVMs. There's a
similar property for reducers.
Now to connect, you first need to find them. You can't specify an address
as usual, since your task JVMs would all be trying to use the same one. If
there's a particular reducer you want to connect to, use the jobtracker
(:50030) to find which cluster node it's running on and ssh to that one.
Otherwise, just connect to any cluster node you want. Then run
ps awfx | grep debug | awk '{print $1}'
which will give you a whole bunch of process ids whose calls contain the
string "debug". Some of these are your task JVMs! Anyway, pick one -- say
it's 2317 -- and run
sudo netstat -ap | grep 2317
which will tell you what port the task is waiting and listening on.
NOW you can go back into your IntelliJ and configure a remote debugger.
Tell it to connect to the host you were sshed into, at the port you just
found. Set your breakpoint; connect your debugger; and you're good to go.
Oh, and by default you've only got about 10 minutes to get this done
before your jobtracker decides that your task node is dead and kills it.
Set the mapreduce.task.timeout higher if you want to have more time to
work.
hth