Re: [jira] Commented: (HADOOP-600) Race condition in JobTracker updating the task tracker's status while declaring it lost

Nigel Daley Mon, 08 Jan 2007 11:09:53 -0800

The patch build failed because 2 tests, TestReplication andTestRestartDFS, failed on RHEL 4. I see that both test logs containthese exceptions:


TestReplication:
    [junit] Data node crashed:
    [junit] java.lang.NullPointerException

[junit] at org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:304)

    [junit]     at org.apache.hadoop.ipc.Client.call(Client.java:455)
    [junit]     at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:164)

[junit] at org.apache.hadoop.dfs.$Proxy0.getProtocolVersion(Unknown Source)

    [junit]     at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:248)
    [junit]     at org.apache.hadoop.ipc.RPC.waitForProxy(RPC.java:227)

[junit] at org.apache.hadoop.dfs.DataNode.<init>(DataNode.java:225)[junit] at org.apache.hadoop.dfs.DataNode.<init>(DataNode.java:171)[junit] at org.apache.hadoop.dfs.MiniDFSCluster$DataNodeRunner.run(MiniDFSCluster.java:118)

    [junit]     at java.lang.Thread.run(Thread.java:595)

[junit] 2007-01-08 18:33:13,436 INFO ipc.Client (Client.java:run(279)) - java.lang.NullPointerException[junit] at org.apache.hadoop.ipc.Client$Connection.run(Client.java:247)


and TestRestartDFS:
    [junit] Data node crashed:
    [junit] java.lang.NullPointerException

[junit] at org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:304)

    [junit]     at org.apache.hadoop.ipc.Client.call(Client.java:455)
    [junit]     at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:164)
    [junit]     at org.apache.hadoop.dfs.$Proxy0.register(Unknown Source)

[junit] at org.apache.hadoop.dfs.DataNode.register(DataNode.java:295)[junit] at org.apache.hadoop.dfs.DataNode.<init>(DataNode.java:183)[junit] at org.apache.hadoop.dfs.MiniDFSCluster$DataNodeRunner.run(MiniDFSCluster.java:118)

    [junit]     at java.lang.Thread.run(Thread.java:595)

[junit] 2007-01-08 18:35:21,223 INFO util.ThreadedServer(ThreadedServer.java:run(656)) - Stopping Acceptor ServerSocket[addr=0.0.0.0/0.0.0.0,port=0,localport=50092][junit] 2007-01-08 18:35:21,223 INFO ipc.Client (Client.java:run(279)) - java.lang.NullPointerException

(Yes, the 0 build attempts is a script error. There was 1 buildattempt.).


I'm unsure how reproducible these are.

Nige

On Jan 8, 2007, at 10:49 AM, Hadoop QA (JIRA) wrote:

[ https://issues.apache.org/jira/browse/HADOOP-600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12463097 ]
Hadoop QA commented on HADOOP-600:
----------------------------------
-1, because 0 attempts failed to build and test the latestattachment (http://issues.apache.org/jira/secure/attachment/12348510/HADOOP-600_20070108_1.patch) against trunk revisionr494137. Please note that this message is automatically generatedand may represent a problem with the automation system and not thepatch.
Race condition in JobTracker updating the task tracker's statuswhile declaring it lost---------------------------------------------------------------------------------------
                Key: HADOOP-600
                URL: https://issues.apache.org/jira/browse/HADOOP-600
            Project: Hadoop
         Issue Type: Bug
         Components: mapred
   Affects Versions: 0.7.1
           Reporter: Owen O'Malley
        Assigned To: Arun C Murthy
            Fix For: 0.10.1

        Attachments: HADOOP-600_20070108_1.patch
There was a case where the JobTracker lost track of a set of tasksthat were on a task tracker. It appears to be a race conditionbecause the ExpireTrackers thread doesn't lock the JobTrackerwhile updating the state. The fix would be to build a list of deadtask trackers and then lock the job tracker while updating theirstatus.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of theadministrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: [jira] Commented: (HADOOP-600) Race condition in JobTracker updating the task tracker's status while declaring it lost

Reply via email to