I have a three node cluster running 1.0.2, today there's a very strange
problem that suddenly two of cassandra  node(let's say B and C) was costing
a lot of cpu, turned out for some reason the "java" binary just dont
run.... I am using OpenJDK1.6.0_18, so I switched to "sun jdk", which works
okay.

after that node A stop working... same problem, I install "sun jdk", then
it's okay. but minutes later, B stop working again, about 5-10 minutes
later after the cassandra started, it stop responding connections, I can't
access 9160 and nodetool dont return either.

I have turned on DEBUG and dont see much useful information, the last rows
on node B are as belows:
DEBUG [pool-2-thread-72] 2012-07-01 07:45:42,830 RowDigestResolver.java
(line 65) resolving 2 responses
DEBUG [pool-2-thread-72] 2012-07-01 07:45:42,830 RowDigestResolver.java
(line 106) digests verified
DEBUG [pool-2-thread-72] 2012-07-01 07:45:42,830 RowDigestResolver.java
(line 110) resolve: 0 ms.
DEBUG [pool-2-thread-72] 2012-07-01 07:45:42,831 StorageProxy.java (line
694) Read: 5 ms.
DEBUG [Thread-8] 2012-07-01 07:45:42,831 IncomingTcpConnection.java (line
116) Version is now 3
DEBUG [Thread-8] 2012-07-01 07:45:42,831 IncomingTcpConnection.java (line
116) Version is now 3


this problem is really driving me crazy since I just dont know what
happened, and how to debug it, I tried to kill node A and restart it, then
node B halt, after I restart B, then node C goes down......


one thing may related is that the log time on node B is not the same with
the system time(A and C are okay).

while date on node B shows:
Sun Jul  1 23:10:57 CST 2012 (system time)

but you may noticed that the time is "2012-07-01 07:45:XX" in those above
log message.  the system time is right, just not sure why cassandra's log
file shows the wrong time, I didn't recall cassandra have timezone
settings.....

Reply via email to