Hi,

We are using Neo4j version 1.9.6 enterprise in a clustered formation, the
following are some of the stats for the DB.

~30 Million nodes
~54 Million relationships.

The cluster is formed by having one instance of neo4j embedded in a Jboss
server and another instance running in stand alone mode exposing the REST
interface. We also have an arbiter running to avoid the constant
re-election issues.

Every now and then (ranges from 5 hours to a day), we observe that the
writes to the embedded server are either failing or takes a very long time
to finish (>5 seconds).

Whenever, we observe this behavior, the following is what gets printed in
messages.log of the embedded server:

at java.lang.Thread.run(Thread.java:744) [na:1.7.0_51]
2014-03-27 00:11:15.625+0000 ERROR [o.n.k.h.c.m.MasterServer]: Could not
finish off dead channel
org.neo4j.kernel.ha.com.master.InvalidEpochException: Invalid epoch
282870793924543, correct epoch is 282870850589260
        at
org.neo4j.kernel.ha.com.master.MasterImpl.assertCorrectEpoch(MasterImpl.java:218)
~[neo4j-ha-1.9.6.jar:1.9.6]
        at
org.neo4j.kernel.ha.com.master.MasterImpl.finishTransaction(MasterImpl.java:419)
~[neo4j-ha-1.9.6.jar:1.9.6]
        at
org.neo4j.kernel.ha.com.master.MasterServer.finishOffChannel(MasterServer.java:69)
~[neo4j-ha-1.9.6.jar:1.9.6]
        at org.neo4j.com.Server.tryToFinishOffChannel(Server.java:408)
~[neo4j-com-1.9.6.jar:1.9.6]
        at org.neo4j.com.Server$4.run(Server.java:586)
[neo4j-com-1.9.6.jar:1.9.6]
        at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
[na:1.7.0_51]
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
[na:1.7.0_51]
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
[na:1.7.0_51]
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
[na:1.7.0_51]

The problem goes away if we restart the standalone neo4j instance (not
where the logs are being printed).

We would appreciate if someone points out what could possibly cause the
issue and how to avoid it.

We ensured that both the servers have correct times and synchronized the
times using ntp.

Thanks,

Virat

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to