Hello,

I'm having some issues with debugging of GIRAPH-45. Code passes local
tests but currently fails

  testBspCheckpoint(org.apache.giraph.TestManualCheckpoint)
  testPartitioners(org.apache.giraph.TestGraphPartitioner)

The first one is particularly tricky as the autocheckpointing is
passed and because this is the only error i get from stderr:

  <testcase time="111.559"
classname="org.apache.giraph.TestManualCheckpoint"
name="testBspCheckpoint">
    <failure 
type="junit.framework.AssertionFailedError">junit.framework.AssertionFailedError
        at junit.framework.Assert.fail(Assert.java:47)
        at junit.framework.Assert.assertTrue(Assert.java:20)
        at junit.framework.Assert.assertTrue(Assert.java:27)
        at 
org.apache.giraph.TestManualCheckpoint.testBspCheckpoint(TestManualCheckpoint.java:108)
</failure>
    <system-out>Setting tasks to 3 for testBspCheckpoint since
JobTracker exists...
setup: Sending job to job tracker localhost:9001 with jar path
target/giraph-0.70-jar-with-dependencies.jar for testBspCheckpoint
testBspCheckpoint: Restarting from superstep 2 with checkpoint path =
/tmp/testBspCheckpoints
setup: Sending job to job tracker localhost:9001 with jar path
target/giraph-0.70-jar-with-dependencies.jar for testBspCheckpoint
</system-out>
    <system-err>java.lang.Throwable: Child Error
        at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271)
Caused by: java.io.IOException: Task process exit with nonzero status of 1.
        at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258)

attempt_201201092336_0002_m_000000_0: 2012-01-09 23:38:27.816
java[12460:1903] Unable to load realm info from SCDynamicStore
</system-err>
  </testcase>

So i checked in hadoop logs, and that's what i found for the failed task:

2012-01-10 20:01:45,760 INFO org.apache.giraph.graph.BspServiceMaster:
barrierOnWorkerList: 0 out of 3 workers finished on superstep 2 on
path 
/_hadoopBsp/job_201201101959_0002/_applicationAttemptsDir/0/_superstepDir/2/_workerFinishedDir
2012-01-10 20:01:45,925 ERROR
org.apache.giraph.graph.BspServiceMaster: superstepChosenWorkerAlive:
Missing chosen worker Worker(hostname=tyler.local, MRpartition=2,
port=30002) on superstep 2
2012-01-10 20:01:45,925 INFO org.apache.giraph.graph.MasterThread:
masterThread: Coordination of superstep 2 took 9.113 seconds ended
with state WORKER_FAILURE and is now on superstep 2
2012-01-10 20:01:45,957 INFO org.apache.giraph.graph.BspServiceMaster:
getLastGoodCheckpoint: Found last good checkpoint 6 from
file:/tmp/testBspCheckpoints/6.finalized
2012-01-10 20:01:46,006 ERROR org.apache.giraph.graph.MasterThread:
masterThread: Master algorithm failed:
java.lang.RuntimeException: retartFromCheckpoint: KeeperException
        at 
org.apache.giraph.graph.BspServiceMaster.restartFromCheckpoint(BspServiceMaster.java:1219)
        at org.apache.giraph.graph.MasterThread.run(MasterThread.java:133)
Caused by: org.apache.zookeeper.KeeperException$NoNodeException:
KeeperErrorCode = NoNode for
/_hadoopBsp/job_201201101959_0002/_inputSplitDir
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
        at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:728)
        at org.apache.giraph.zk.ZooKeeperExt.deleteExt(ZooKeeperExt.java:238)
        at 
org.apache.giraph.graph.BspServiceMaster.restartFromCheckpoint(BspServiceMaster.java:1214)
        ... 1 more
2012-01-10 20:01:46,006 FATAL org.apache.giraph.graph.GraphMapper:
uncaughtException: OverrideExceptionHandler on thread
org.apache.giraph.graph.MasterThread, msg =
java.lang.RuntimeException: retartFromCheckpoint: KeeperException,
exiting...
java.lang.RuntimeException: java.lang.RuntimeException:
retartFromCheckpoint: KeeperException
        at org.apache.giraph.graph.MasterThread.run(MasterThread.java:177)
Caused by: java.lang.RuntimeException: retartFromCheckpoint: KeeperException
        at 
org.apache.giraph.graph.BspServiceMaster.restartFromCheckpoint(BspServiceMaster.java:1219)
        at org.apache.giraph.graph.MasterThread.run(MasterThread.java:133)
Caused by: org.apache.zookeeper.KeeperException$NoNodeException:
KeeperErrorCode = NoNode for
/_hadoopBsp/job_201201101959_0002/_inputSplitDir
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
        at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:728)
        at org.apache.giraph.zk.ZooKeeperExt.deleteExt(ZooKeeperExt.java:238)
        at 
org.apache.giraph.graph.BspServiceMaster.restartFromCheckpoint(BspServiceMaster.java:1214)
        ... 1 more
2012-01-10 20:01:46,008 WARN org.apache.giraph.zk.ZooKeeperManager:
onlineZooKeeperServers: Forced a shutdown hook kill of the ZooKeeper
process.

which is unlikely caused by my code. Any ideas?

maybe a hadoop issue? just freshly installed in pseudo-distributed on
my machine.

-- 
   Claudio Martella
   claudio.marte...@gmail.com

Reply via email to