Hello, (Inline)
On Tue, Oct 18, 2011 at 12:04 AM, W.P. McNeill <[email protected]> wrote: <snip> > 1. *Turn on JMX remote for the tasks*...I added the following options to > mapred.child.java.opts: > com.sun.management.jmxremote, > com.sun.management.jmxremote.port=8004,com.sun.management.jmxremote.authenticate=false,com.sun.management.jmxremote.ssl > = false. > > This does not work because there is contention for the JMX remote port when > multiple tasks run on the same node. All but the first task fail at JVM > initialization time, causing the job to fail before I can see the repro. For profiling/etc. this way, you are probably interested in just one task. So switch down your slots to 1, and that'd be an easy way out - 1 mapper at a time, reusing the port as it goes. > 2. *Use jstatd*...I tried running jstatd in the background on my cluster > nodes. It launches and runs, but when I try to connect using Visual VM, > nothing happens. While I find it odd that jstatd doesn't seem to expose the host's jvm metrics out for you, I don't think jstatd would let you do memory profiling AFAIK. You need jmx for that, right? You can observe heap charts with jstatd running though, I think. > I am going to try adding -XX:-HeapDumpOnOutOfMemoryError, which will at > least give me post-mortem information. Does anyone know where the heap dump > file will be written? Enable keep.failed.task.files as true for your job, then hunt the attempt directory down in your mapred.local.dir of the TaskTracker that ran it. An easier way is to also log the Child's pwd via your java code so you see which disk its on when you check logs first. Under the attempt dir, you should be able to locate your heap dump. > Has anyone debugged a similar setup? What tools did you use? I think you'll find some (possibly odd looking) ways described on https://issues.apache.org/jira/browse/MAPREDUCE-2637, similar to your approach. -- Harsh J
