[Neo4j] HA Cluster becoming sometimes and slaves leaving cluster

Smit Sanghavi Mon, 07 Jul 2014 00:09:08 -0700

I have a neo4j 1.9.7 HA setup locally on 2 machines. One is for web 
services and other is for a daily batch. They work fine but occaisionally 
they start misbehaving. Recently I had got them working. They worked fine 
for a few hours but then I was greeted with these messages in the *BATCH* 
setup. Can anybody explain me if I should worry about them?


2014-07-05 00:00:08.648+0000 INFO  [o.n.k.h.HighlyAvailableGraphDatabase]: 
Transaction 1461 for nioneodb couldn't commit on enough slaves, desired 2, 
but could only commit at 1
2014-07-05 00:00:09.315+0000 INFO  [o.n.k.h.HighlyAvailableGraphDatabase]: 
Transaction 664 for lucene-index couldn't commit on enough slaves, desired 
2, but could only commit at 1
2014-07-05 00:00:09.340+0000 INFO  [o.n.k.h.HighlyAvailableGraphDatabase]: 
Transaction 665 for lucene-index couldn't commit on enough slaves, desired 
2, but could only commit at 1
2014-07-05 00:00:09.416+0000 INFO  [o.n.k.h.HighlyAvailableGraphDatabase]: 
Transaction 666 for lucene-index couldn't commit on enough slaves, desired 
2, but could only commit at 1
2014-07-05 00:00:09.918+0000 INFO  [o.n.k.h.HighlyAvailableGraphDatabase]: 
GC Monitor: Application threads blocked for an additional 324ms [total 
block time: 0.324s]
.....
.....
.....
2014-07-05 00:40:01.209+0000 INFO  [o.n.k.h.HighlyAvailableGraphDatabase]: 
GC Monitor: Application threads blocked for an additional 1116ms [total 
block time: 2.89s]
2014-07-05 02:12:02.783+0000 INFO  [o.n.k.h.HighlyAvailableGraphDatabase]: 
Transaction 1468 for nioneodb couldn't commit on enough slaves, desired 2, 
but could only commit at 1
2014-07-05 02:24:07.441+0000 INFO  [o.n.k.h.HighlyAvailableGraphDatabase]: 
Transaction 1469 for nioneodb couldn't commit on enough slaves, desired 2, 
but could only commit at 1

The next set of messages are more worrying:
2014-07-05 02:32:31.994+0000 INFO  [o.n.c.p.h.HeartbeatContext]: 2(me) is 
now suspecting 1
2014-07-05 02:32:31.995+0000 INFO  [o.n.c.p.h.HeartbeatContext]: Notifying 
listeners that instance 1 is failed
2014-07-05 02:32:31.996+0000 WARN  
[o.n.c.p.e.ClusterLeaveReelectionListener]:  instance 1 is being demoted 
since it failed
2014-07-05 02:32:31.997+0000 INFO  [o.n.k.h.HighAvailabilityConsoleLogger]: 
Instance 1 has failed
2014-07-05 02:32:31.998+0000 INFO  [o.n.k.h.HighlyAvailableGraphDatabase]: 
Database is no longer ready
2014-07-05 02:32:31.998+0000 INFO  [o.n.k.h.HighAvailabilityConsoleLogger]: 
Write transactions to database disabled
2014-07-05 02:32:32.008+0000 WARN  [o.n.c.p.e.ElectionState]: Context says 
election is not OK to proceed. Failed instances are: [1], cluster members 
are: {1=cluster://10.210.230.53:5000, 2=cluster://10.193.205.142:5000}
2014-07-05 02:33:22.069+0000 WARN  [o.n.c.p.a.m.ProposerState]: Propose 
failed due to phase 1 timeout
2014-07-05 02:33:27.077+0000 WARN  [o.n.c.p.a.m.ProposerState]: Propose 
failed due to phase 1 timeout
.....
.....
.....
2014-07-05 02:35:57.334+0000 WARN  [o.n.c.p.a.m.ProposerState]: Propose 
failed due to phase 1 timeout
2014-07-05 02:35:57.883+0000 INFO  [o.n.c.p.h.HeartbeatContext]: Notifying 
listeners that instance 1 is alive
2014-07-05 02:35:57.884+0000 INFO  [o.n.k.h.HighAvailabilityConsoleLogger]: 
Instance 1 is alive
2014-07-05 02:35:57.968+0000 INFO  [o.n.k.h.HighAvailabilityConsoleLogger]: 
Instance 2 (this server) is unavailable as master
2014-07-05 02:35:57.968+0000 INFO  [o.n.k.h.HighAvailabilityConsoleLogger]: 
Instance 1 is unavailable as slave
2014-07-05 02:35:57.972+0000 INFO  [o.n.k.h.HighAvailabilityConsoleLogger]: 
Instance 2 (this server) is unavailable as backup

After this I am no more able to have access to server and any request on 
the *WEB* server gives me this exception:
org.neo4j.graphdb.TransactionFailureException: Timeout waiting for database 
to allow new transactions. Blocking components (1): 
[HighAvailabilityMemberStateMachine[PENDING]
at 
org.neo4j.kernel.ha.HighlyAvailableGraphDatabase.beginTx(HighlyAvailableGraphDatabase.java:199)

This will get resolved after a restart of Neo4j on both the machines. But 
why is this occurring in the 1st place?
I'm happily provide any logs which might be needed and test things out.

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

[Neo4j] HA Cluster becoming sometimes and slaves leaving cluster

Reply via email to