[ 
https://issues.apache.org/jira/browse/GIRAPH-601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13617938#comment-13617938
 ] 

Eli Reisman commented on GIRAPH-601:
------------------------------------

So masterCount is part of the problem, forcing us to have a "task 0" to be
below the masterCount value of 1? Whats up with masterCount?

Did you possible ask for more workers than your YARN cluster has resources
for? Check out your YARN webui. Could be MRv2 is waiting until the cluster
has enough mem to launch all of your PR tasks, and that moment never comes
in time? Not sure how (or how well) MRv2 wraps these problems.

Also, did you see in one of the earlier dumps that YarnClientImpl is
hitting an IOE on security tokens? Is that normal? I did you had auth on
SIMPLE so that should work as-is?





                
> Exception when running pagerank benchmark: SendVertexRequest cannot be cast 
> to MasterRequest
> --------------------------------------------------------------------------------------------
>
>                 Key: GIRAPH-601
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-601
>             Project: Giraph
>          Issue Type: Bug
>            Reporter: Eugene Koontz
>         Attachments: instrumentation.patch
>
>
> Building Giraph with:
> {code}
> mvn -DskipTests  -Phadoop_2.0.3 clean compile
> {code}
> Running pagerank like this:
> {code}
>  $HADOOP_RUNTIME/bin/hadoop jar $JAR \
>          org.apache.giraph.benchmark.PageRankBenchmark \
>         -e 10 -s 10 -v -V 10 -w 6
> {code}
> I see this in  
> /tmp/userlogs/application_1364578380737_0003/container_1364578380737_0003_01_000002/
>  :
> {code}
> 2013-03-29 10:58:06,371 DEBUG [org.apache.giraph.master.MasterThread] 
> org.apache.giraph.master.BspServiceMaster: barrierOnWorkerList: Got finished 
> worker list = [Eugenes-MacBook-Pro.local_1, Eugenes-MacBook-Pro.local_3], 
> size = 2, worker list = [Worker(hostname=Eugenes-MacBook-Pro.local, 
> MRtaskID=2, port=30002), Worker(hostname=Eugenes-MacBook-Pro.local, 
> MRtaskID=1, port=30001), Worker(hostname=Eugenes-MacBook-Pro.local, 
> MRtaskID=4, port=30004), Worker(hostname=Eugenes-MacBook-Pro.local, 
> MRtaskID=3, port=30003), Worker(hostname=Eugenes-MacBook-Pro.local, 
> MRtaskID=5, port=30005), Worker(hostname=Eugenes-MacBook-Pro.local, 
> MRtaskID=0, port=30010)], size = 6 from 
> /_hadoopBsp/job_1364578380737_0003/_vertexInputSplitDoneDir
> 2013-03-29 10:58:06,373 WARN [netty-server-exec-3] 
> org.apache.giraph.comm.netty.handler.RequestServerHandler: exceptionCaught: 
> Channel failed with remote address /172.16.175.1:56236
> java.lang.ClassCastException: 
> org.apache.giraph.comm.requests.SendVertexRequest cannot be cast to 
> org.apache.giraph.comm.requests.MasterRequest
>       at 
> org.apache.giraph.comm.netty.handler.MasterRequestServerHandler.processRequest(MasterRequestServerHandler.java:27)
>       at 
> org.apache.giraph.comm.netty.handler.RequestServerHandler.messageReceived(RequestServerHandler.java:106)
>       at 
> org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
>       at 
> org.jboss.netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(OneToOneDecoder.java:71)
>       at 
> org.jboss.netty.handler.execution.ChannelUpstreamEventRunnable.doRun(ChannelUpstreamEventRunnable.java:45)
>       at 
> org.jboss.netty.handler.execution.ChannelEventRunnable.run(ChannelEventRunnable.java:69)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
>       at java.lang.Thread.run(Thread.java:680)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to