Re: [Neo4j] Intermittently getting " Invalid epoch", and then all the writes to DB start failing

Virat Gohil Thu, 27 Mar 2014 11:16:40 -0700

Hi Jacob,

Thank you very much for the in-depth explanation. I think the issue was
that we configured the standalone instance as slave only and it cannot
become master. Thus the cluster goes into a state of infinite re-election
till the slave (standalone) is also not available (once we re-start the
slave).


We have looked at the messages.log and did not find anything apparent that
would suggest why a re-election occurred in the first place, we will
continue to monitor the behavior.

We recently moved from a purely REST based setup to running embedded neo4j
and have gained significant performance improvement which we have come to
love and cherish, and thus want to continue to be on embedded setup. Having
said that, we are also open to try the neo4j jdbc driver if it gives us
similar performance advantage while giving us more stability.

If there is a documentation already available on how to get
spring-data-neo4j to work with neo4j-jdbc drivers then kindly point us to
the same.

Once again, thank you very much for the help.

Cheers!

Virat


On Thu, Mar 27, 2014 at 3:02 AM, Jacob Hansson <[email protected]>
 wrote:

> Invalid epoch means the server recieved a message from an older (no longer
> valid) logical time epoch in the cluster, which, less cryptically,
> generally there has been an election and the new master (possibly the same
> instance as the previous master) recieved a message that was addressed to
> the old master.
>
> That is, this is a protection mechanism that ensures messages meant for a
> logically different master are not acted on by the current master. we
> should probably not log these exceptions, they are expected in normal
> operation.
>
> The co-occurrence of these and the long pause you observed indicate
> something causes the master to stall long enough to cause a re-election. I
> would advise looking for gc monitor messages in our messages.log to see if
> there are large gc pauses.
>
> Since you are running neo in process with another java application, it
> becomes very important to ensure that application makes very little GC
> noise, otherwise it may disturb neo in this way.
>
> Would you consider running two neo4j servers and talking to neo from jboss
> via the neo4j jdbc driver?
>
> Sent from my phone, please excuse typos and brevity.
> On Mar 27, 2014 1:20 AM, "Virat Gohil" <[email protected]> wrote:
>
>> Hi,
>>
>> We are using Neo4j version 1.9.6 enterprise in a clustered formation, the
>> following are some of the stats for the DB.
>>
>> ~30 Million nodes
>> ~54 Million relationships.
>>
>> The cluster is formed by having one instance of neo4j embedded in a Jboss
>> server and another instance running in stand alone mode exposing the REST
>> interface. We also have an arbiter running to avoid the constant
>> re-election issues.
>>
>> Every now and then (ranges from 5 hours to a day), we observe that the
>> writes to the embedded server are either failing or takes a very long time
>> to finish (>5 seconds).
>>
>> Whenever, we observe this behavior, the following is what gets printed in
>> messages.log of the embedded server:
>>
>> at java.lang.Thread.run(Thread.java:744) [na:1.7.0_51]
>> 2014-03-27 00:11:15.625+0000 ERROR [o.n.k.h.c.m.MasterServer]: Could not
>> finish off dead channel
>> org.neo4j.kernel.ha.com.master.InvalidEpochException: Invalid epoch
>> 282870793924543, correct epoch is 282870850589260
>>         at
>> org.neo4j.kernel.ha.com.master.MasterImpl.assertCorrectEpoch(MasterImpl.java:218)
>> ~[neo4j-ha-1.9.6.jar:1.9.6]
>>         at
>> org.neo4j.kernel.ha.com.master.MasterImpl.finishTransaction(MasterImpl.java:419)
>> ~[neo4j-ha-1.9.6.jar:1.9.6]
>>         at
>> org.neo4j.kernel.ha.com.master.MasterServer.finishOffChannel(MasterServer.java:69)
>> ~[neo4j-ha-1.9.6.jar:1.9.6]
>>         at org.neo4j.com.Server.tryToFinishOffChannel(Server.java:408)
>> ~[neo4j-com-1.9.6.jar:1.9.6]
>>         at org.neo4j.com.Server$4.run(Server.java:586)
>> [neo4j-com-1.9.6.jar:1.9.6]
>>         at
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>> [na:1.7.0_51]
>>         at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>> [na:1.7.0_51]
>>         at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> [na:1.7.0_51]
>>         at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> [na:1.7.0_51]
>>
>> The problem goes away if we restart the standalone neo4j instance (not
>> where the logs are being printed).
>>
>> We would appreciate if someone points out what could possibly cause the
>> issue and how to avoid it.
>>
>> We ensured that both the servers have correct times and synchronized the
>> times using ntp.
>>
>> Thanks,
>>
>> Virat
>>
>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Neo4j" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> For more options, visit https://groups.google.com/d/optout.
>>
> --
> You received this message because you are subscribed to the Google Groups
> "Neo4j" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [Neo4j] Intermittently getting " Invalid epoch", and then all the writes to DB start failing

Reply via email to