Race condition on superstep 1 with RPC servers not started by the time that 
requests are sent

                 Key: GIRAPH-46
                 URL: https://issues.apache.org/jira/browse/GIRAPH-46
             Project: Giraph
          Issue Type: Bug
    Affects Versions: 0.70.0
            Reporter: Avery Ching
            Assignee: Avery Ching
            Priority: Minor
             Fix For: 0.70.0
         Attachments: diff.txt


occasionally (maybe one time in four), my giraph run fails because of the below 
According to code, it should never happen:

if (msgMap == null) { // should never happen after constructor throw new 
RuntimeException( "sendMessage: msgMap did not exist for " + addr + " for 
vertex " + destVertex); }

This happens during superstep 1 (second superstep). My application actually 
*adds* edges on superstep 1
(to make every out-edge also an in-edge of the destination), but since I am 
running only on 3 workers,
I am surprised if every worker would not had been registered in the RPC layer 

One hypothesis is that Hadoop does something funny, because one of my server 
was under heavy
load. Maybe Hadoop launched another worker to replace a slow worker? Can it 

java.lang.RuntimeException: sendMessage: msgMap did not exist for 
[hostname].ml.cmu.edu:30003 for vertex 875713
        at org.apache.giraph.graph.BasicVertex.sendMsg(BasicVertex.java:179)
        at edu.cmu.selectlab.BP.BinaryBPVertex.compute(BinaryBPVertex.java:94)
        at org.apache.giraph.graph.GraphMapper.map(GraphMapper.java:624)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.mapred.Child.main(Child.java:253)

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
For more information on JIRA, see: http://www.atlassian.com/software/jira


Reply via email to