[ 
https://issues.apache.org/jira/browse/GIRAPH-46?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Avery Ching updated GIRAPH-46:
------------------------------

    Attachment: diff.txt

Aapo reported success and I was able to run unittests against LocalJobRunner 
and my local Hadoop instance.
                
> Race condition on superstep 1 with RPC servers not started by the time that 
> requests are sent
> ---------------------------------------------------------------------------------------------
>
>                 Key: GIRAPH-46
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-46
>             Project: Giraph
>          Issue Type: Bug
>    Affects Versions: 0.70.0
>            Reporter: Avery Ching
>            Assignee: Avery Ching
>            Priority: Minor
>             Fix For: 0.70.0
>
>         Attachments: diff.txt
>
>
> Hi,
> occasionally (maybe one time in four), my giraph run fails because of the 
> below RuntimeException.
> According to code, it should never happen:
> if (msgMap == null) { // should never happen after constructor throw new 
> RuntimeException( "sendMessage: msgMap did not exist for " + addr + " for 
> vertex " + destVertex); }
> This happens during superstep 1 (second superstep). My application actually 
> *adds* edges on superstep 1
> (to make every out-edge also an in-edge of the destination), but since I am 
> running only on 3 workers,
> I am surprised if every worker would not had been registered in the RPC layer 
> initially.
> One hypothesis is that Hadoop does something funny, because one of my server 
> was under heavy
> load. Maybe Hadoop launched another worker to replace a slow worker? Can it 
> happen?
> java.lang.RuntimeException: sendMessage: msgMap did not exist for 
> [hostname].ml.cmu.edu:30003 for vertex 875713
>         at 
> org.apache.giraph.comm.BasicRPCCommunications.sendMessageReq(BasicRPCCommunications.java:825)
>         at org.apache.giraph.graph.BasicVertex.sendMsg(BasicVertex.java:179)
>         at edu.cmu.selectlab.BP.BinaryBPVertex.compute(BinaryBPVertex.java:94)
>         at org.apache.giraph.graph.GraphMapper.map(GraphMapper.java:624)
>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
>         at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:396)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
>         at org.apache.hadoop.mapred.Child.main(Child.java:253)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to