[ 
https://issues.apache.org/jira/browse/HADOOP-4595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12645299#action_12645299
 ] 

Aaron Kimball commented on HADOOP-4595:
---------------------------------------

Note that both existing JVMs have their 'busy' status as true. But numFreeSlots 
was > 0; the preceeding log entry was:

2008-11-05 08:57:55,296 INFO org.apache.hadoop.mapred.TaskTracker: In 
TaskLauncher, current free slots : 1 and trying to launch 
attempt_200811040110_0230_r_000011_1


At the top of TaskTracker.run() , it waits for numFreeSlots to be > 0, then 
calls startNewTask(tip) which eventually calls TaskRunner.run() -> 
JvmManager.launchJvm() -> reapJvm(). numFreeSlots seems to be properly 
synchronized, as does JvmManager.JvmRunner.busy. So I think the error  may be 
that JvmRunner.busy isn't being set to false (by JvmRunner.taskRan()) before 
addFreeReduceSlot() is being called.  I can't figure out where these two 
methods (or other ones that work on the same state) are being called in 
relation to one another though.


> JVM Reuse triggers RuntimeException("Invalid state")
> ----------------------------------------------------
>
>                 Key: HADOOP-4595
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4595
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.19.0
>            Reporter: Aaron Kimball
>
> A Reducer triggers the following exception:
> 08/11/05 08:58:50 INFO mapred.JobClient: Task Id : 
> attempt_200811040110_0230_r_000008_1, Status : FAILED
> java.lang.RuntimeException: Inconsistent state!!! JVM Manager reached an 
> unstable state while reaping a JVM for task: 
> attempt_200811040110_0230_r_000008_1 Number of active JVMs:2
>  JVMId jvm_200811040110_0230_r_-735233075 #Tasks ran: 0 Currently busy? true 
> Currently running: attempt_200811040110_0230_r_000012_0
>  JVMId jvm_200811040110_0230_r_-1716942642 #Tasks ran: 0 Currently busy? true 
> Currently running: attempt_200811040110_0230_r_000040_0
>    at java.lang.Throwable.<init>(Throwable.java:67)
>    at 
> org.apache.hadoop.mapred.JvmManager$JvmManagerForType.reapJvm(JvmManager.java:245)
>    at 
> org.apache.hadoop.mapred.JvmManager$JvmManagerForType.access$000(JvmManager.java:113)
>    at org.apache.hadoop.mapred.JvmManager.launchJvm(JvmManager.java:78)
>    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:410) 
> Other clues:
> In the three reduce task attempts where this was observed, this was attempt 
> _1. Attempt _0 had started and eventually switches to "SUCCEEDED." So I think 
> this is happening only on speculatively-executed reduce task attempts. The 
> reduce output (part-XXXXX) gets lost when this attempt fails, even though the 
> other (earlier) attempt succeeded.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to