NullPointerException in JVMMetrics for OOM killed task
------------------------------------------------------

                 Key: HADOOP-2108
                 URL: https://issues.apache.org/jira/browse/HADOOP-2108
             Project: Hadoop
          Issue Type: Bug
          Components: metrics
    Affects Versions: 0.14.2
         Environment: Centos5 jdk1.6.0_02
            Reporter: Richard Lee
            Priority: Minor


I had a reduce task run out of memory and die in such a way that 
JVMMetrics.doThreadUpdates() throws a NullPointerException.

The aparent cause seems to be that the call to threadMXBean.getThreadInfo() on 
JVMMetrics:119 returns an array of ThreadInfo whose elements may be null.

Here's a relevant quote from the javadoc:
This method returns an array of the ThreadInfo objects,
     * each is the thread information about the thread with the same index
     * as in the ids array.
     * If a thread of the given ID is not alive or does not exist,
     * null will be set in the corresponding element 
     * in the returned array.  A thread is alive if 
     * it has been started and has not yet died.

My stacktrace looks like this:
java.lang.NullPointerException
        at 
org.apache.hadoop.metrics.jvm.JvmMetrics.doThreadUpdates(JvmMetrics.java:129)
        at 
org.apache.hadoop.metrics.jvm.JvmMetrics.doUpdates(JvmMetrics.java:79)
        at 
org.apache.hadoop.metrics.spi.AbstractMetricsContext.timerEvent(AbstractMetricsContext.java:284)
        at 
org.apache.hadoop.metrics.spi.AbstractMetricsContext.access$000(AbstractMetricsContext.java:50)
        at 
org.apache.hadoop.metrics.spi.AbstractMetricsContext$1.run(AbstractMetricsContext.java:249)
        at java.util.TimerThread.mainLoop(Timer.java:512)
        at java.util.TimerThread.run(Timer.java:462)

On line 129,  there's an attempt to dereference the potientially null 
threadInfo value to get its current state.

The naive solution here is to check for null and count null values as 
"terminated"... but it seems clear that a thread state of TERMINATED and a null 
ThreadInfo value are distinct cases and may need special treatment.

Guessing that this is a "minor" issue because it seems more cosmetic than 
mission critical.  I'm not sure what the upstream effects are of this method 
throwing the NPE, so i didn't set it to "trivial".

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to