[Neo4j] Re: HA slaves leaving cluster randomly

Sanjay Jain Tue, 15 Jul 2014 04:10:30 -0700

I too have the same problem. Does anybody have a solution to this?

On Monday, July 7, 2014 12:43:17 PM UTC+5:30, Smit Sanghavi wrote:
>
> I have a neo4j 1.9.7 HA setup locally on 2 machines. One is for web 
> services and other is for a daily batch. They work fine but occaisionally 
> they start misbehaving. Recently I had got them working. They worked fine 
> for a few hours but then I was greeted with these messages in the *BATCH* 
> setup. Can anybody explain me if I should worry about them?
>
> 2014-07-05 00:00:08.648+0000 INFO  [o.n.k.h.
> HighlyAvailableGraphDatabase]: Transaction 1461 for nioneodb couldn't 
> commit on enough slaves, desired 2, but could only commit at 1
> 2014-07-05 00:00:09.315+0000 INFO  [o.n.k.h.HighlyAvailableGraphDatabase]: 
> Transaction 664 for lucene-index couldn't commit on enough slaves, desired 
> 2, but could only commit at 1
> 2014-07-05 00:00:09.340+0000 INFO  [o.n.k.h.HighlyAvailableGraphDatabase]: 
> Transaction 665 for lucene-index couldn't commit on enough slaves, desired 
> 2, but could only commit at 1
> 2014-07-05 00:00:09.416+0000 INFO  [o.n.k.h.HighlyAvailableGraphDatabase]: 
> Transaction 666 for lucene-index couldn't commit on enough slaves, desired 
> 2, but could only commit at 1
> 2014-07-05 00:00:09.918+0000 INFO  [o.n.k.h.HighlyAvailableGraphDatabase]: 
> GC Monitor: Application threads blocked for an additional 324ms [total 
> block time: 0.324s]
> .....
> .....
> .....
> 2014-07-05 00:40:01.209+0000 INFO  [o.n.k.h.HighlyAvailableGraphDatabase]: 
> GC Monitor: Application threads blocked for an additional 1116ms [total 
> block time: 2.89s]
> 2014-07-05 02:12:02.783+0000 INFO  [o.n.k.h.HighlyAvailableGraphDatabase]: 
> Transaction 1468 for nioneodb couldn't commit on enough slaves, desired 2, 
> but could only commit at 1
> 2014-07-05 02:24:07.441+0000 INFO  [o.n.k.h.HighlyAvailableGraphDatabase]: 
> Transaction 1469 for nioneodb couldn't commit on enough slaves, desired 2, 
> but could only commit at 1
>
> The next set of messages are more worrying:
> 2014-07-05 02:32:31.994+0000 INFO  [o.n.c.p.h.HeartbeatContext]: 2(me) is 
> now suspecting 1
> 2014-07-05 02:32:31.995+0000 INFO  [o.n.c.p.h.HeartbeatContext]: Notifying 
> listeners that instance 1 is failed
> 2014-07-05 02:32:31.996+0000 WARN  
> [o.n.c.p.e.ClusterLeaveReelectionListener]:  instance 1 is being demoted 
> since it failed
> 2014-07-05 02:32:31.997+0000 INFO  
> [o.n.k.h.HighAvailabilityConsoleLogger]: Instance 1 has failed
> 2014-07-05 02:32:31.998+0000 INFO  [o.n.k.h.HighlyAvailableGraphDatabase]: 
> Database is no longer ready
> 2014-07-05 02:32:31.998+0000 INFO  
> [o.n.k.h.HighAvailabilityConsoleLogger]: Write transactions to database 
> disabled
> 2014-07-05 02:32:32.008+0000 WARN  [o.n.c.p.e.ElectionState]: Context says 
> election is not OK to proceed. Failed instances are: [1], cluster members 
> are: {1=cluster://10.210.230.53:5000, 2=cluster://10.193.205.142:5000}
> 2014-07-05 02:33:22.069+0000 WARN  [o.n.c.p.a.m.ProposerState]: Propose 
> failed due to phase 1 timeout
> 2014-07-05 02:33:27.077+0000 WARN  [o.n.c.p.a.m.ProposerState]: Propose 
> failed due to phase 1 timeout
> .....
> .....
> .....
> 2014-07-05 02:35:57.334+0000 WARN  [o.n.c.p.a.m.ProposerState]: Propose 
> failed due to phase 1 timeout
> 2014-07-05 02:35:57.883+0000 INFO  [o.n.c.p.h.HeartbeatContext]: Notifying 
> listeners that instance 1 is alive
> 2014-07-05 02:35:57.884+0000 INFO  
> [o.n.k.h.HighAvailabilityConsoleLogger]: Instance 1 is alive
> 2014-07-05 02:35:57.968+0000 INFO  
> [o.n.k.h.HighAvailabilityConsoleLogger]: Instance 2 (this server) is 
> unavailable as master
> 2014-07-05 02:35:57.968+0000 INFO  
> [o.n.k.h.HighAvailabilityConsoleLogger]: Instance 1 is unavailable as slave
> 2014-07-05 02:35:57.972+0000 INFO  
> [o.n.k.h.HighAvailabilityConsoleLogger]: Instance 2 (this server) is 
> unavailable as backup
>
> After this I am no more able to have access to server and any request on 
> the *WEB* server gives me this exception:
> org.neo4j.graphdb.TransactionFailureException: Timeout waiting for 
> database to allow new transactions. Blocking components (1): 
> [HighAvailabilityMemberStateMachine[PENDING]
> at 
> org.neo4j.kernel.ha.HighlyAvailableGraphDatabase.beginTx(HighlyAvailableGraphDatabase.java:199)
>
> This will get resolved after a restart of Neo4j on both the machines. But 
> why is this occurring in the 1st place?
> I'm happily provide any logs which might be needed and test things out.
>


-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

[Neo4j] Re: HA slaves leaving cluster randomly

Reply via email to