Correct. I had an issue in readFields(DataInput in) for my vertex value type. Unfortunately, I never got to see the real exception until I wrote local tests (which one of course should do before). The error returned was java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:375) which was irritating since I don't call readInt() in any place ...
Where is a good spot to fix this? -- i.e., add vertex value errors to user logs. Chr -------- Original-Nachricht -------- > Datum: Tue, 03 Jan 2012 11:07:56 -0800 > Von: Avery Ching <[email protected]> > An: [email protected] > Betreff: Re: java.io.EOFException > It appears that you had a problem with the serialization/deserialization > of your vertex and/or its types (I, E, V, M). You might want to try to > test that out separately. > > Avery > > On 1/3/12 3:54 AM, "Christoph Böhm" wrote: > > Thanks! > > The next exception I cannot explain myself is the following. > > I have one input file of the form: > > > [2095029,[[1100046950,-1],[952771928,-1]],[[1276522248,0.9829082],[322609086,0.013525307]]] > > > [5146036,[[947366954,-1],[34019593,-1]],[[1199061143,0.573876],[1024309140,0.98412496]]] > > > [5270429,[[800028028,-1],[1362541830,-1]],[[164325925,0.92203426],[148512084,0.65505975]]] > > ... and want to use say 5 workers. > > Then worker tenem05 reports what is below. > > > > Cheers. > > Christoph > > > > -------------- > > java.lang.RuntimeException: java.io.IOException: Call to > tenem02//172.16.23.151:30003 failed on local exception: java.io.EOFException > > at > org.apache.giraph.comm.BasicRPCCommunications.sendPartitionReq(BasicRPCCommunications.java:780) > > at > org.apache.giraph.graph.BspServiceWorker.loadVertices(BspServiceWorker.java:304) > > at > org.apache.giraph.graph.BspServiceWorker.setup(BspServiceWorker.java:569) > > at org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:458) > > at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:630) > > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763) > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369) > > at org.apache.hadoop.mapred.Child$4.run(Child.java:259) > > at java.security.AccessController.doPrivileged(Native Method) > > at javax.security.auth.Subject.doAs(Subject.java:396) > > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059) > > at org.apache.hadoop.mapred.Child.main(Child.java:253) > > Caused by: java.io.IOException: Call to tenem02/172.16.23.151:30003 > failed on local exception: java.io.EOFException > > at org.apache.hadoop.ipc.Client.wrapException(Client.java:1065) > > at org.apache.hadoop.ipc.Client.call(Client.java:1033) > > at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:224) > > at $Proxy3.putVertexList(Unknown Source) > > at > org.apache.giraph.comm.BasicRPCCommunications.sendPartitionReq(BasicRPCCommunications.java:777) > > ... 11 more > > Caused by: java.io.EOFException > > at java.io.DataInputStream.readInt(DataInputStream.java:375) > > at > org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:767) > > at org.apache.hadoop.ipc.Client$Connection.run(Client.java:712) > > 2012-01-03 12:35:46,259 ERROR org.apache.giraph.graph.GraphMapper: > setup: Caught exception just before end of setup > > java.lang.IllegalStateException: setup: loadVertices failed > > at > org.apache.giraph.graph.BspServiceWorker.setup(BspServiceWorker.java:576) > > at org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:458) > > at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:630) > > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763) > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369) > > at org.apache.hadoop.mapred.Child$4.run(Child.java:259) > > at java.security.AccessController.doPrivileged(Native Method) > > at javax.security.auth.Subject.doAs(Subject.java:396) > > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059) > > at org.apache.hadoop.mapred.Child.main(Child.java:253) > > Caused by: java.lang.RuntimeException: java.io.IOException: Call to > tenem02/172.16.23.151:30003 failed on local exception: java.io.EOFException > > at > org.apache.giraph.comm.BasicRPCCommunications.sendPartitionReq(BasicRPCCommunications.java:780) > > at > org.apache.giraph.graph.BspServiceWorker.loadVertices(BspServiceWorker.java:304) > > at > org.apache.giraph.graph.BspServiceWorker.setup(BspServiceWorker.java:569) > > ... 9 more > > Caused by: java.io.IOException: Call to tenem02/172.16.23.151:30003 > failed on local exception: java.io.EOFException > > at org.apache.hadoop.ipc.Client.wrapException(Client.java:1065) > > at org.apache.hadoop.ipc.Client.call(Client.java:1033) > > at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:224) > > at $Proxy3.putVertexList(Unknown Source) > > at > org.apache.giraph.comm.BasicRPCCommunications.sendPartitionReq(BasicRPCCommunications.java:777) > > ... 11 more > > Caused by: java.io.EOFException > > at java.io.DataInputStream.readInt(DataInputStream.java:375) > > at > org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:767) > > at org.apache.hadoop.ipc.Client$Connection.run(Client.java:712) > > 2012-01-03 12:35:46,260 ERROR org.apache.giraph.graph.BspServiceWorker: > unregisterHealth: Got failure, unregistering health on > /_hadoopBsp/job_201112231316_4347/_applicationAttemptsDir/0/_superstepDir/-1/_workerHealthyDir/tenem05_1 > on superstep -1 > > 2012-01-03 12:35:46,270 INFO org.apache.hadoop.mapred.TaskLogsTruncater: > Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1 > > 2012-01-03 12:35:46,320 INFO org.apache.hadoop.io.nativeio.NativeIO: > Initialized cache for UID to User mapping with a cache timeout of 14400 > seconds. > > 2012-01-03 12:35:46,320 INFO org.apache.hadoop.io.nativeio.NativeIO: Got > UserName hadoop00 for UID 503 from the native implementation > > 2012-01-03 12:35:46,322 WARN org.apache.hadoop.mapred.Child: Error > running child > > java.lang.IllegalStateException: run: Caught an unrecoverable exception > setup: Offlining servers due to exception... > > at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:641) > > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763) > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369) > > at org.apache.hadoop.mapred.Child$4.run(Child.java:259) > > at java.security.AccessController.doPrivileged(Native Method) > > at javax.security.auth.Subject.doAs(Subject.java:396) > > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059) > > at org.apache.hadoop.mapred.Child.main(Child.java:253) > > Caused by: java.lang.RuntimeException: setup: Offlining servers due to > exception... > > at org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:466) > > at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:630) > > ... 7 more > > Caused by: java.lang.IllegalStateException: setup: loadVertices failed > > at > org.apache.giraph.graph.BspServiceWorker.setup(BspServiceWorker.java:576) > > at org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:458) > > ... 8 more > > Caused by: java.lang.RuntimeException: java.io.IOException: Call to > tenem02/172.16.23.151:30003 failed on local exception: java.io.EOFException > > at > org.apache.giraph.comm.BasicRPCCommunications.sendPartitionReq(BasicRPCCommunications.java:780) > > at > org.apache.giraph.graph.BspServiceWorker.loadVertices(BspServiceWorker.java:304) > > at > org.apache.giraph.graph.BspServiceWorker.setup(BspServiceWorker.java:569) > > ... 9 more > > Caused by: java.io.IOException: Call to tenem02/172.16.23.151:30003 > failed on local exception: java.io.EOFException > > at org.apache.hadoop.ipc.Client.wrapException(Client.java:1065) > > at org.apache.hadoop.ipc.Client.call(Client.java:1033) > > at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:224) > > at $Proxy3.putVertexList(Unknown Source) > > at > org.apache.giraph.comm.BasicRPCCommunications.sendPartitionReq(BasicRPCCommunications.java:777) > > ... 11 more > > Caused by: java.io.EOFException > > at java.io.DataInputStream.readInt(DataInputStream.java:375) > > at > org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:767) > > at org.apache.hadoop.ipc.Client$Connection.run(Client.java:712) > > 2012-01-03 12:35:46,337 INFO org.apache.hadoop.mapred.Task: Runnning > cleanup for the task > > > > > > > > > > > > -------- Original-Nachricht -------- > >> Datum: Fri, 23 Dec 2011 09:25:24 -0800 > >> Von: Avery Ching<[email protected]> > >> An: [email protected] > >> Betreff: Re: zookeeper connection issue > >> Yeah, of those errors can seem a little scary. But I think they are > >> mostly harmless. Let's go over each one inline. > >> > >> On 12/23/11 7:10 AM, "Christoph Böhm" wrote: > >>> Hi List, > >>> > >>> I'm about to get started with Giraph and have a few of questions: > >>> when running the Pagrank example with > >>> hadoop jar giraph-0.70-jar-with-dependencies.jar > >> org.apache.giraph.benchmark.PageRankBenchmark -e 1 -s 3 -v -V 500000 -w > 10 > >>> this finishes but I find the following in one worker's logs: > >>> > >>> *** Worker: > >>> 2011-12-23 15:36:09,468 ERROR org.apache.zookeeper.ClientCnxn: Error > >> while calling watcher > >>> java.lang.RuntimeException: > >> org.apache.zookeeper.KeeperException$ConnectionLossException: > KeeperErrorCode = ConnectionLoss for > >> /_hadoopBsp/job_201112231316_0010/_masterJobState > >>> at > org.apache.giraph.graph.BspService.getJobState(BspService.java:564) > >>> at > >> > org.apache.giraph.graph.BspServiceWorker.processEvent(BspServiceWorker.java:1414) > >>> at org.apache.giraph.graph.BspService.process(BspService.java:1017) > >>> at > >> > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:530) > >>> at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:506) > >>> Caused by: > org.apache.zookeeper.KeeperException$ConnectionLossException: > >> KeeperErrorCode = ConnectionLoss for > >> /_hadoopBsp/job_201112231316_0010/_masterJobState > >>> at > org.apache.zookeeper.KeeperException.create(KeeperException.java:90) > >>> at > org.apache.zookeeper.KeeperException.create(KeeperException.java:42) > >>> at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:637) > >>> at org.apache.giraph.zk.ZooKeeperExt.createExt(ZooKeeperExt.java:99) > >>> at > org.apache.giraph.graph.BspService.getJobState(BspService.java:555) > >>> ... 4 more > >> Depends when this happens. If it's after the worker has let the master > >> know that it was finished with everything, this is fine. > >> > >>> *** The Master says: > >>> 2011-12-23 15:45:40,564 WARN org.apache.giraph.zk.ZooKeeperManager: > >> onlineZooKeeperServers: Got ConnectException > >>> java.net.ConnectException: Connection refused > >>> at java.net.PlainSocketImpl.socketConnect(Native Method) > >>> at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333) > >>> at > java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195) > >>> at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182) > >>> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366) > >>> at java.net.Socket.connect(Socket.java:525) > >>> at > >> > org.apache.giraph.zk.ZooKeeperManager.onlineZooKeeperServers(ZooKeeperManager.java:624) > >>> at org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:408) > >>> at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:630) > >>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763) > >>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369) > >>> at org.apache.hadoop.mapred.Child$4.run(Child.java:259) > >>> at java.security.AccessController.doPrivileged(Native Method) > >>> at javax.security.auth.Subject.doAs(Subject.java:396) > >>> at > >> > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059) > >>> at org.apache.hadoop.mapred.Child.main(Child.java:253) > >>> > >>> > >>> > >>> Also, when I'm trying to run my own Job I see the following. All > >> firewalls etc. should be shutdown. > >>> *** Master (node09.de): > >>> 2011-12-23 15:57:47,140 INFO org.apache.giraph.zk.ZooKeeperManager: > >> onlineZooKeeperServers: Connect attempt 0 of 10 max trying to connect > to > >> node09.de:22181 with poll msecs = 3000 > >>> 2011-12-23 15:57:47,143 WARN org.apache.giraph.zk.ZooKeeperManager: > >> onlineZooKeeperServers: Got ConnectException > >>> java.net.ConnectException: Connection refused > >>> at java.net.PlainSocketImpl.socketConnect(Native Method) > >>> at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333) > >>> at > java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195) > >>> at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182) > >>> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366) > >>> at java.net.Socket.connect(Socket.java:525) > >>> at > >> > org.apache.giraph.zk.ZooKeeperManager.onlineZooKeeperServers(ZooKeeperManager.java:624) > >>> at org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:409) > >>> at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:630) > >>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763) > >>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369) > >>> at org.apache.hadoop.mapred.Child$4.run(Child.java:259) > >>> at java.security.AccessController.doPrivileged(Native Method) > >>> at javax.security.auth.Subject.doAs(Subject.java:396) > >>> at > >> > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059) > >>> at org.apache.hadoop.mapred.Child.main(Child.java:253) > >>> > >>> > >>> > >>> Thanks again. > >>> Christoph > >> These two exceptions on the master are also fine. It takes some time > >> for the master to start the zk service (hence the multiple connection > >> attempts). >
