Yeah, this is kind of annoying that it's hard to figure out where this happens. Unfortunately, this error happens in hadoop RPC, we don't have control of this code (it's from Apache Hadoop). I suppose we could add some generic vertex checking utilities in the unittests that could be easily extended. Maybe add this in the FAQ since it seems like a common error?

Avery

On 1/5/12 2:14 PM, "Christoph Böhm" wrote:
Correct. I had an issue in readFields(DataInput in) for my vertex value type.
Unfortunately, I never got to see the real exception until I wrote local tests 
(which one of course should do before).
The error returned was java.io.EOFException at 
java.io.DataInputStream.readInt(DataInputStream.java:375) which was irritating 
since I don't call readInt() in any place ...

Where is a good spot to fix this? -- i.e., add vertex value errors to user logs.

Chr

-------- Original-Nachricht --------
Datum: Tue, 03 Jan 2012 11:07:56 -0800
Von: Avery Ching<ach...@apache.org>
An: giraph-user@incubator.apache.org
Betreff: Re: java.io.EOFException
It appears that you had a problem with the serialization/deserialization
of your vertex and/or its types (I, E, V, M).  You might want to try to
test that out separately.

Avery

On 1/3/12 3:54 AM, "Christoph Böhm" wrote:
Thanks!
The next exception I cannot explain myself is the following.
I have one input file of the form:

[2095029,[[1100046950,-1],[952771928,-1]],[[1276522248,0.9829082],[322609086,0.013525307]]]
[5146036,[[947366954,-1],[34019593,-1]],[[1199061143,0.573876],[1024309140,0.98412496]]]
[5270429,[[800028028,-1],[1362541830,-1]],[[164325925,0.92203426],[148512084,0.65505975]]]
... and want to use say 5 workers.
Then worker tenem05 reports what is below.

Cheers.
Christoph

--------------
java.lang.RuntimeException: java.io.IOException: Call to
tenem02//172.16.23.151:30003 failed on local exception: java.io.EOFException
        at
org.apache.giraph.comm.BasicRPCCommunications.sendPartitionReq(BasicRPCCommunications.java:780)
        at
org.apache.giraph.graph.BspServiceWorker.loadVertices(BspServiceWorker.java:304)
        at
org.apache.giraph.graph.BspServiceWorker.setup(BspServiceWorker.java:569)
        at org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:458)
        at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:630)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
        at org.apache.hadoop.mapred.Child.main(Child.java:253)
Caused by: java.io.IOException: Call to tenem02/172.16.23.151:30003
failed on local exception: java.io.EOFException
        at org.apache.hadoop.ipc.Client.wrapException(Client.java:1065)
        at org.apache.hadoop.ipc.Client.call(Client.java:1033)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:224)
        at $Proxy3.putVertexList(Unknown Source)
        at
org.apache.giraph.comm.BasicRPCCommunications.sendPartitionReq(BasicRPCCommunications.java:777)
        ... 11 more
Caused by: java.io.EOFException
        at java.io.DataInputStream.readInt(DataInputStream.java:375)
        at
org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:767)
        at org.apache.hadoop.ipc.Client$Connection.run(Client.java:712)
2012-01-03 12:35:46,259 ERROR org.apache.giraph.graph.GraphMapper:
setup: Caught exception just before end of setup
java.lang.IllegalStateException: setup: loadVertices failed
        at
org.apache.giraph.graph.BspServiceWorker.setup(BspServiceWorker.java:576)
        at org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:458)
        at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:630)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
        at org.apache.hadoop.mapred.Child.main(Child.java:253)
Caused by: java.lang.RuntimeException: java.io.IOException: Call to
tenem02/172.16.23.151:30003 failed on local exception: java.io.EOFException
        at
org.apache.giraph.comm.BasicRPCCommunications.sendPartitionReq(BasicRPCCommunications.java:780)
        at
org.apache.giraph.graph.BspServiceWorker.loadVertices(BspServiceWorker.java:304)
        at
org.apache.giraph.graph.BspServiceWorker.setup(BspServiceWorker.java:569)
        ... 9 more
Caused by: java.io.IOException: Call to tenem02/172.16.23.151:30003
failed on local exception: java.io.EOFException
        at org.apache.hadoop.ipc.Client.wrapException(Client.java:1065)
        at org.apache.hadoop.ipc.Client.call(Client.java:1033)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:224)
        at $Proxy3.putVertexList(Unknown Source)
        at
org.apache.giraph.comm.BasicRPCCommunications.sendPartitionReq(BasicRPCCommunications.java:777)
        ... 11 more
Caused by: java.io.EOFException
        at java.io.DataInputStream.readInt(DataInputStream.java:375)
        at
org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:767)
        at org.apache.hadoop.ipc.Client$Connection.run(Client.java:712)
2012-01-03 12:35:46,260 ERROR org.apache.giraph.graph.BspServiceWorker:
unregisterHealth: Got failure, unregistering health on
/_hadoopBsp/job_201112231316_4347/_applicationAttemptsDir/0/_superstepDir/-1/_workerHealthyDir/tenem05_1
on superstep -1
2012-01-03 12:35:46,270 INFO org.apache.hadoop.mapred.TaskLogsTruncater:
Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
2012-01-03 12:35:46,320 INFO org.apache.hadoop.io.nativeio.NativeIO:
Initialized cache for UID to User mapping with a cache timeout of 14400
seconds.
2012-01-03 12:35:46,320 INFO org.apache.hadoop.io.nativeio.NativeIO: Got
UserName hadoop00 for UID 503 from the native implementation
2012-01-03 12:35:46,322 WARN org.apache.hadoop.mapred.Child: Error
running child
java.lang.IllegalStateException: run: Caught an unrecoverable exception
setup: Offlining servers due to exception...
        at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:641)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
        at org.apache.hadoop.mapred.Child.main(Child.java:253)
Caused by: java.lang.RuntimeException: setup: Offlining servers due to
exception...
        at org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:466)
        at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:630)
        ... 7 more
Caused by: java.lang.IllegalStateException: setup: loadVertices failed
        at
org.apache.giraph.graph.BspServiceWorker.setup(BspServiceWorker.java:576)
        at org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:458)
        ... 8 more
Caused by: java.lang.RuntimeException: java.io.IOException: Call to
tenem02/172.16.23.151:30003 failed on local exception: java.io.EOFException
        at
org.apache.giraph.comm.BasicRPCCommunications.sendPartitionReq(BasicRPCCommunications.java:780)
        at
org.apache.giraph.graph.BspServiceWorker.loadVertices(BspServiceWorker.java:304)
        at
org.apache.giraph.graph.BspServiceWorker.setup(BspServiceWorker.java:569)
        ... 9 more
Caused by: java.io.IOException: Call to tenem02/172.16.23.151:30003
failed on local exception: java.io.EOFException
        at org.apache.hadoop.ipc.Client.wrapException(Client.java:1065)
        at org.apache.hadoop.ipc.Client.call(Client.java:1033)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:224)
        at $Proxy3.putVertexList(Unknown Source)
        at
org.apache.giraph.comm.BasicRPCCommunications.sendPartitionReq(BasicRPCCommunications.java:777)
        ... 11 more
Caused by: java.io.EOFException
        at java.io.DataInputStream.readInt(DataInputStream.java:375)
        at
org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:767)
        at org.apache.hadoop.ipc.Client$Connection.run(Client.java:712)
2012-01-03 12:35:46,337 INFO org.apache.hadoop.mapred.Task: Runnning
cleanup for the task




-------- Original-Nachricht --------
Datum: Fri, 23 Dec 2011 09:25:24 -0800
Von: Avery Ching<ach...@apache.org>
An: giraph-user@incubator.apache.org
Betreff: Re: zookeeper connection issue
Yeah, of those errors can seem a little scary.  But I think they are
mostly harmless.  Let's go over each one inline.

On 12/23/11 7:10 AM, "Christoph Böhm" wrote:
Hi List,

I'm about to get started with Giraph and have a few of questions:
when running the Pagrank example with
      hadoop jar giraph-0.70-jar-with-dependencies.jar
org.apache.giraph.benchmark.PageRankBenchmark -e 1 -s 3 -v -V 500000 -w
10
this finishes but I find the following in one worker's logs:

*** Worker:
2011-12-23 15:36:09,468 ERROR org.apache.zookeeper.ClientCnxn: Error
while calling watcher
java.lang.RuntimeException:
org.apache.zookeeper.KeeperException$ConnectionLossException:
KeeperErrorCode = ConnectionLoss for
/_hadoopBsp/job_201112231316_0010/_masterJobState
        at
org.apache.giraph.graph.BspService.getJobState(BspService.java:564)
        at
org.apache.giraph.graph.BspServiceWorker.processEvent(BspServiceWorker.java:1414)
        at org.apache.giraph.graph.BspService.process(BspService.java:1017)
        at
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:530)
        at
org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:506)
Caused by:
org.apache.zookeeper.KeeperException$ConnectionLossException:
KeeperErrorCode = ConnectionLoss for
/_hadoopBsp/job_201112231316_0010/_masterJobState
        at
org.apache.zookeeper.KeeperException.create(KeeperException.java:90)
        at
org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
        at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:637)
        at org.apache.giraph.zk.ZooKeeperExt.createExt(ZooKeeperExt.java:99)
        at
org.apache.giraph.graph.BspService.getJobState(BspService.java:555)
        ... 4 more
Depends when this happens.  If it's after the worker has let the master
know that it was finished with everything, this is fine.

*** The Master says:
2011-12-23 15:45:40,564 WARN org.apache.giraph.zk.ZooKeeperManager:
onlineZooKeeperServers: Got ConnectException
java.net.ConnectException: Connection refused
        at java.net.PlainSocketImpl.socketConnect(Native Method)
        at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
        at
java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195)
        at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
        at java.net.Socket.connect(Socket.java:525)
        at
org.apache.giraph.zk.ZooKeeperManager.onlineZooKeeperServers(ZooKeeperManager.java:624)
        at org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:408)
        at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:630)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
        at org.apache.hadoop.mapred.Child.main(Child.java:253)



Also, when I'm trying to run my own Job I see the following. All
firewalls etc. should be shutdown.
*** Master (node09.de):
2011-12-23 15:57:47,140 INFO org.apache.giraph.zk.ZooKeeperManager:
onlineZooKeeperServers: Connect attempt 0 of 10 max trying to connect
to
node09.de:22181 with poll msecs = 3000
2011-12-23 15:57:47,143 WARN org.apache.giraph.zk.ZooKeeperManager:
onlineZooKeeperServers: Got ConnectException
java.net.ConnectException: Connection refused
        at java.net.PlainSocketImpl.socketConnect(Native Method)
        at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
        at
java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195)
        at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
        at java.net.Socket.connect(Socket.java:525)
        at
org.apache.giraph.zk.ZooKeeperManager.onlineZooKeeperServers(ZooKeeperManager.java:624)
        at org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:409)
        at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:630)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
        at org.apache.hadoop.mapred.Child.main(Child.java:253)



Thanks again.
Christoph
These two exceptions on the master are also fine.  It takes some time
for the master to start the zk service (hence the multiple connection
attempts).


Reply via email to