Can you add GC flags?  Perhaps you are running into full GCs...

On 2/19/13 1:45 PM, Claudio Martella wrote:
I have not investigated further yet, but this does not look connected to
ZK, but mostly to some concurrency on the queues. The number of requests
stays to 29...


On Tue, Feb 19, 2013 at 9:58 PM, Sebastian Schelter <[email protected]> wrote:

For me, Giraph on Hadoop 1.0.4 runs into some kind of deadlock after the
input was loaded when using it with 200 workers on a billion-scale
graph....

I tried several versions of zookeeper, but nothing helped.

Best,
Sebastian


On 19.02.2013 21:55, Claudio Martella wrote:
I'm running PR with trunk on yahoo webgraph across 528 workers and I get
the workers hanging at the flushing:

2013-02-19 21:40:07,187 INFO org.apache.giraph.graph.ComputeCallable:
call:
Computation took 28.386961 secs for 95 partitions on superstep 0.
  Flushing
started
2013-02-19 21:40:07,604 INFO org.apache.giraph.worker.BspServiceWorker:
finishSuperstep: Waiting on all requests, superstep 0 Memory
(free/total/max) = 5993.42M / 8661.44M / 14346.69M
2013-02-19 21:40:07,605 INFO org.apache.giraph.comm.netty.NettyClient:
waitSomeRequests: Waiting interval of 15000 msecs, 558 open requests,
waiting for it to be <= 0, MBytes/sec sent = 6.4758, MBytes/sec received
=
0, MBytesSent = 175.3834, MBytesReceived = 0.0003, ave sent req MBytes =
0.2998, ave received req MBytes = 0, secs waited = 27.083
2013-02-19 21:40:26,717 INFO org.apache.giraph.comm.netty.NettyClient:
waitSomeRequests: Waiting interval of 15000 msecs, 39 open requests,
waiting for it to be <= 0, MBytes/sec sent = 3.7965, MBytes/sec received
=
0.0001, MBytesSent = 175.3834, MBytesReceived = 0.0068, ave sent req
MBytes
= 0.2998, ave received req MBytes = 0, secs waited = 46.195
2013-02-19 21:40:32,248 INFO
org.apache.giraph.comm.netty.handler.RequestDecoder: decode: Server
window
metrics MBytes/sec sent = 0.0002, MBytes/sec received = 3.9824,
MBytesSent
= 0.0066, MBytesReceived = 155.4787, ave sent req MBytes = 0, ave
received
req MBytes = 0.0155, secs waited = 39.04
2013-02-19 21:40:42,041 INFO org.apache.giraph.comm.netty.NettyClient:
waitSomeRequests: Waiting interval of 15000 msecs, 36 open requests,
waiting for it to be <= 0, MBytes/sec sent = 2.8509, MBytes/sec received
=
0.0001, MBytesSent = 175.3834, MBytesReceived = 0.0068, ave sent req
MBytes
= 0.2998, ave received req MBytes = 0, secs waited = 61.519
2013-02-19 21:41:07,456 INFO org.apache.giraph.comm.netty.NettyClient:
waitSomeRequests: Waiting interval of 15000 msecs, 29 open requests,
waiting for it to be <= 0, MBytes/sec sent = 2.0174, MBytes/sec received
=
0.0001, MBytesSent = 175.3834, MBytesReceived = 0.0069, ave sent req
MBytes
= 0.2998, ave received req MBytes = 0, secs waited = 86.933
2013-02-19 21:41:22,456 INFO org.apache.giraph.comm.netty.NettyClient:
waitSomeRequests: Waiting interval of 15000 msecs, 29 open requests,
waiting for it to be <= 0, MBytes/sec sent = 1.7205, MBytes/sec received
=
0.0001, MBytesSent = 175.3834, MBytesReceived = 0.0069, ave sent req
MBytes
= 0.2998, ave received req MBytes = 0, secs waited = 101.934
2013-02-19 21:41:37,457 INFO org.apache.giraph.comm.netty.NettyClient:
waitSomeRequests: Waiting interval of 15000 msecs, 29 open requests,
waiting for it to be <= 0, MBytes/sec sent = 1.4998, MBytes/sec received
=
0.0001, MBytesSent = 175.3834, MBytesReceived = 0.0069, ave sent req
MBytes
= 0.2998, ave received req MBytes = 0, secs waited = 116.935
2013-02-19 21:41:52,458 INFO org.apache.giraph.comm.netty.NettyClient:
waitSomeRequests: Waiting interval of 15000 msecs, 29 open requests,
waiting for it to be <= 0, MBytes/sec sent = 1.3293, MBytes/sec received
=
0.0001, MBytesSent = 175.3834, MBytesReceived = 0.0069, ave sent req
MBytes
= 0.2998, ave received req MBytes = 0, secs waited = 131.936
2013-02-19 21:42:22,459 INFO org.apache.giraph.comm.netty.NettyClient:
waitSomeRequests: Waiting interval of 15000 msecs, 29 open requests,
waiting for it to be <= 0, MBytes/sec sent = 1.083, MBytes/sec received =
0, MBytesSent = 175.3834, MBytesReceived = 0.0069, ave sent req MBytes =
0.2998, ave received req MBytes = 0, secs waited = 161.937
2013-02-19 21:42:37,460 INFO org.apache.giraph.comm.netty.NettyClient:
waitSomeRequests: Waiting interval of 15000 msecs, 29 open requests,
waiting for it to be <= 0, MBytes/sec sent = 0.9912, MBytes/sec received
=
0, MBytesSent = 175.3834, MBytesReceived = 0.0069, ave sent req MBytes =
0.2998, ave received req MBytes = 0, secs waited = 176.938
2013-02-19 21:43:07,461 INFO org.apache.giraph.comm.netty.NettyClient:
waitSomeRequests: Waiting interval of 15000 msecs, 29 open requests,
waiting for it to be <= 0, MBytes/sec sent = 0.8475, MBytes/sec received
=
0, MBytesSent = 175.3834, MBytesReceived = 0.0069, ave sent req MBytes =
0.2998, ave received req MBytes = 0, secs waited = 206.939
2013-02-19 21:43:37,462 INFO org.apache.giraph.comm.netty.NettyClient:
waitSomeRequests: Waiting interval of 15000 msecs, 29 open requests,
waiting for it to be <= 0, MBytes/sec sent = 0.7402, MBytes/sec received
=
0, MBytesSent = 175.3834, MBytesReceived = 0.0069, ave sent req MBytes =
0.2998, ave received req MBytes = 0, secs waited = 236.94
2013-02-19 21:43:52,463 INFO org.apache.giraph.comm.netty.NettyClient:
waitSomeRequests: Waiting interval of 15000 msecs, 29 open requests,
waiting for it to be <= 0, MBytes/sec sent = 0.6961, MBytes/sec received
=
0, MBytesSent = 175.3834, MBytesReceived = 0.0069, ave sent req MBytes =
0.2998, ave received req MBytes = 0, secs waited = 251.941
2013-02-19 21:44:07,464 INFO org.apache.giraph.comm.netty.NettyClient:
waitSomeRequests: Waiting interval of 15000 msecs, 29 open requests,
waiting for it to be <= 0, MBytes/sec sent = 0.657, MBytes/sec received =
0, MBytesSent = 175.3834, MBytesReceived = 0.0069, ave sent req MBytes =
0.2998, ave received req MBytes = 0, secs waited = 266.942
2013-02-19 21:44:37,465 INFO org.apache.giraph.comm.netty.NettyClient:
waitSomeRequests: Waiting interval of 15000 msecs, 29 open requests,
waiting for it to be <= 0, MBytes/sec sent = 0.5906, MBytes/sec received
=
0, MBytesSent = 175.3834, MBytesReceived = 0.0069, ave sent req MBytes =
0.2998, ave received req MBytes = 0, secs waited = 296.943
2013-02-19 21:45:07,466 INFO org.apache.giraph.comm.netty.NettyClient:
waitSomeRequests: Waiting interval of 15000 msecs, 29 open requests,
waiting for it to be <= 0, MBytes/sec sent = 0.5364, MBytes/sec received
=
0, MBytesSent = 175.3834, MBytesReceived = 0.0069, ave sent req MBytes =
0.2998, ave received req MBytes = 0, secs waited = 326.944
2013-02-19 21:45:22,467 INFO org.apache.giraph.comm.netty.NettyClient:
waitSomeRequests: Waiting interval of 15000 msecs, 29 open requests,
waiting for it to be <= 0, MBytes/sec sent = 0.5129, MBytes/sec received
=
0, MBytesSent = 175.3834, MBytesReceived = 0.0069, ave sent req MBytes =
0.2998, ave received req MBytes = 0, secs waited = 341.945
2013-02-19 21:45:37,468 INFO org.apache.giraph.comm.netty.NettyClient:
waitSomeRequests: Waiting interval of 15000 msecs, 29 open requests,
waiting for it to be <= 0, MBytes/sec sent = 0.4913, MBytes/sec received
=
0, MBytesSent = 175.3834, MBytesReceived = 0.0069, ave sent req MBytes =
0.2998, ave received req MBytes = 0, secs waited = 356.946
2013-02-19 21:46:07,469 INFO org.apache.giraph.comm.netty.NettyClient:
waitSomeRequests: Waiting interval of 15000 msecs, 29 open requests,
waiting for it to be <= 0, MBytes/sec sent = 0.4532, MBytes/sec received
=
0, MBytesSent = 175.3834, MBytesReceived = 0.0069, ave sent req MBytes =
0.2998, ave received req MBytes = 0, secs waited = 386.947
2013-02-19 21:46:22,470 INFO org.apache.giraph.comm.netty.NettyClient:
waitSomeRequests: Waiting interval of 15000 msecs, 29 open requests,
waiting for it to be <= 0, MBytes/sec sent = 0.4363, MBytes/sec received
=
0, MBytesSent = 175.3834, MBytesReceived = 0.0069, ave sent req MBytes =
0.2998, ave received req MBytes = 0, secs waited = 401.948
2013-02-19 21:46:52,471 INFO org.apache.giraph.comm.netty.NettyClient:
waitSomeRequests: Waiting interval of 15000 msecs, 29 open requests,
waiting for it to be <= 0, MBytes/sec sent = 0.406, MBytes/sec received =
0, MBytesSent = 175.3834, MBytesReceived = 0.0069, ave sent req MBytes =
0.2998, ave received req MBytes = 0, secs waited = 431.949
2013-02-19 21:47:07,472 INFO org.apache.giraph.comm.netty.NettyClient:
waitSomeRequests: Waiting interval of 15000 msecs, 29 open requests,
waiting for it to be <= 0, MBytes/sec sent = 0.3924, MBytes/sec received
=
0, MBytesSent = 175.3834, MBytesReceived = 0.0069, ave sent req MBytes =
0.2998, ave received req MBytes = 0, secs waited = 446.95
2013-02-19 21:47:37,473 INFO org.apache.giraph.comm.netty.NettyClient:
waitSomeRequests: Waiting interval of 15000 msecs, 29 open requests,
waiting for it to be <= 0, MBytes/sec sent = 0.3677, MBytes/sec received
=
0, MBytesSent = 175.3834, MBytesReceived = 0.0069, ave sent req MBytes =
0.2998, ave received req MBytes = 0, secs waited = 476.951
2013-02-19 21:48:07,474 INFO org.apache.giraph.comm.netty.NettyClient:
waitSomeRequests: Waiting interval of 15000 msecs, 29 open requests,
waiting for it to be <= 0, MBytes/sec sent = 0.346, MBytes/sec received =
0, MBytesSent = 175.3834, MBytesReceived = 0.0069, ave sent req MBytes =
0.2998, ave received req MBytes = 0, secs waited = 506.952
2013-02-19 21:48:22,475 INFO org.apache.giraph.comm.netty.NettyClient:
waitSomeRequests: Waiting interval of 15000 msecs, 29 open requests,
waiting for it to be <= 0, MBytes/sec sent = 0.336, MBytes/sec received =
0, MBytesSent = 175.3834, MBytesReceived = 0.0069, ave sent req MBytes =
0.2998, ave received req MBytes = 0, secs waited = 521.953
2013-02-19 21:48:52,476 INFO org.apache.giraph.comm.netty.NettyClient:
waitSomeRequests: Waiting interval of 15000 msecs, 29 open requests,
waiting for it to be <= 0, MBytes/sec sent = 0.3177, MBytes/sec received
=
0, MBytesSent = 175.3834, MBytesReceived = 0.0069, ave sent req MBytes =
0.2998, ave received req MBytes = 0, secs waited = 551.954
2013-02-19 21:49:07,477 INFO org.apache.giraph.comm.netty.NettyClient:
waitSomeRequests: Waiting interval of 15000 msecs, 29 open requests,
waiting for it to be <= 0, MBytes/sec sent = 0.3093, MBytes/sec received
=
0, MBytesSent = 175.3834, MBytesReceived = 0.0069, ave sent req MBytes =
0.2998, ave received req MBytes = 0, secs waited = 566.955

It looks similar to the problem we had earlier with concurrency.

Are you guys experiencing something similar at large-scale?




Reply via email to