Alessio Arleo created GIRAPH-970:
------------------------------------

             Summary: Missing chosen workers on superstep -1
                 Key: GIRAPH-970
                 URL: https://issues.apache.org/jira/browse/GIRAPH-970
             Project: Giraph
          Issue Type: Bug
          Components: bsp
    Affects Versions: 1.1.0
         Environment: Linux version 3.13.0-37-generic (buildd@kapok) (gcc 
version 4.8.2 (Ubuntu 4.8.2-19ubuntu1) 64 bit
Hadoop 1.2.1
            Reporter: Alessio Arleo


I found a problem with Giraph 1.1.0 while trying to execute the 
ShortestPathComputation example. 

This is the command given:
$HADOOP_HOME/bin/hadoop jar  
~/git/giraph_patched/giraph-examples/target/giraph-examples-1.1.0-for-hadoop-1.2.1-jar-with-dependencies.jar
 org.apache.giraph.GiraphRunner  
org.apache.giraph.examples.SimpleShortestPathsComputation -vif 
org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat -vip 
/users/hadoop/input/tiny_graph.txt -vof 
org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op 
/users/hadoop/output/shortestpath -w 1

And there is the output:
#################################

Warning: $HADOOP_HOME is deprecated.

14/12/15 12:07:36 INFO utils.ConfigurationUtils: No edge input format 
specified. Ensure your InputFormat does not require one.
14/12/15 12:07:36 INFO utils.ConfigurationUtils: No edge output format 
specified. Ensure your OutputFormat does not require one.
14/12/15 12:07:36 INFO job.GiraphJob: run: Since checkpointing is disabled 
(default), do not allow any task retries (setting mapred.map.max.attempts = 0, 
old value = 4)
14/12/15 12:07:38 INFO job.GiraphJob: Tracking URL: 
http://VirtualMINT-H023:50030/jobdetails.jsp?jobid=job_201412151205_0001
14/12/15 12:07:38 INFO job.GiraphJob: Waiting for resources... Job will start 
only when it gets all 2 mappers
14/12/15 12:08:51 INFO job.HaltApplicationUtils$DefaultHaltInstructionsWriter: 
writeHaltInstructions: To halt after next superstep execute: 
'bin/halt-application --zkServer virtualmint-h023:22181 --zkNode 
/_hadoopBsp/job_201412151205_0001/_haltComputation'
14/12/15 12:08:51 INFO mapred.JobClient: Running job: job_201412151205_0001
14/12/15 12:08:52 INFO mapred.JobClient:  map 100% reduce 0%

################################

The computation hangs here until the timeout is reached. Here is what I found 
while reading the first worker log.

2014-12-15 12:12:16,303 INFO org.apache.giraph.master.BspServiceMaster: 
createVertexInputSplits: Starting to write input split data to zookeeper with 1 
threads
2014-12-15 12:12:16,314 INFO org.apache.giraph.master.BspServiceMaster: 
createVertexInputSplits: Done writing input split data to zookeeper
2014-12-15 12:12:16,332 INFO org.apache.giraph.comm.netty.NettyClient: Using 
Netty without authentication.
2014-12-15 12:12:16,341 INFO org.apache.giraph.comm.netty.NettyClient: 
connectAllAddresses: Successfully added 1 connections, (1 total connected) 0 
failed, 0 failures total.
2014-12-15 12:12:16,344 INFO org.apache.giraph.partition.PartitionUtils: 
computePartitionCount: Creating 1, default would have been 1 partitions.
2014-12-15 12:12:16,373 INFO org.apache.giraph.master.BspServiceMaster: 
barrierOnWorkerList: 0 out of 1 workers finished on superstep -1 on path 
/_hadoopBsp/job_201412151211_0001/_vertexInputSplitDoneDir
2014-12-15 12:12:16,375 INFO org.apache.giraph.master.BspServiceMaster: 
barrierOnWorkerList: Waiting on [virtualmint-h023_1]
2014-12-15 12:12:16,393 INFO org.apache.giraph.comm.netty.NettyServer: start: 
Using Netty without authentication.
2014-12-15 12:12:16,464 ERROR org.apache.giraph.master.BspServiceMaster: 
barrierOnWorkerList: Missing chosen workers [Worker(hostname=virtualmint-h023, 
MRtaskID=1, port=30001)] on superstep -1
2014-12-15 12:12:16,464 ERROR org.apache.giraph.master.MasterThread: 
masterThread: Master algorithm failed with IllegalStateException
java.lang.IllegalStateException: coordinateVertexInputSplits: Worker failed 
during input split (currently not supported)
        at 
org.apache.giraph.master.BspServiceMaster.coordinateInputSplits(BspServiceMaster.java:1489)
        at 
org.apache.giraph.master.BspServiceMaster.coordinateSuperstep(BspServiceMaster.java:1656)
        at org.apache.giraph.master.MasterThread.run(MasterThread.java:124)
2014-12-15 12:12:16,464 FATAL org.apache.giraph.graph.GraphTaskManager: 
uncaughtException: OverrideExceptionHandler on thread 
org.apache.giraph.master.MasterThread, msg = java.lang.IllegalStateException: 
coordinateVertexInputSplits: Worker failed during input split (currently not 
supported), exiting...
java.lang.IllegalStateException: java.lang.IllegalStateException: 
coordinateVertexInputSplits: Worker failed during input split (currently not 
supported)
        at org.apache.giraph.master.MasterThread.run(MasterThread.java:194)
Caused by: java.lang.IllegalStateException: coordinateVertexInputSplits: Worker 
failed during input split (currently not supported)
        at 
org.apache.giraph.master.BspServiceMaster.coordinateInputSplits(BspServiceMaster.java:1489)
        at 
org.apache.giraph.master.BspServiceMaster.coordinateSuperstep(BspServiceMaster.java:1656)
        at org.apache.giraph.master.MasterThread.run(MasterThread.java:124)
2014-12-15 12:12:16,464 WARN org.apache.giraph.zk.ZooKeeperManager: 
logZooKeeperOutput: Dumping up to last 100 lines of the ZooKeeper process 
STDOUT and STDERR.

################################

Computation does not even get to first superstep. Giraph cannot find the 
worker. Giraph-904 patch applied to BspServiceMaster.

I am running the Hadoop 1.2.1 on a single machine with the configuration 
suggested in the Giraph Quick Start guide. Hadoop itself works fine (tested 
with wordcount example). 






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to