Hi,
Please advise on required configuration for the 2-instance "HA" setup (AWS,
neo4j Enterprise 3.0.1).
Currently I have on both instances:
- dbms.mode=HA
- ha.initial_hosts=172.31.35.147:5001,172.31.33.173:5001
- ha.host.coordination is commented out
- ha.host.data is commented out
Port 5001, 5002, 7474, 6001 open on both.
Differences
1. One node has ha.server_id=1 (172.31.33.173), another one - ha.server_id=2
2. Node with id=1 is Debian 8.4, id=2 is Centos 7
With this setup, node with id=1 starts w/o problems, elected as master,
second one however fails.
Some log extracts:
2016-05-17 04:50:51.781+0000 INFO [o.n.k.h.MasterClient214]
MasterClient214 communication channel created towards /127.0.0.1:6001
2016-05-17 04:50:51.790+0000 INFO [o.n.k.h.c.SwitchToSlave] Copying store
from master
2016-05-17 04:50:51.791+0000 INFO [o.n.k.h.MasterClient214] Thread[31, HA
Mode switcher-1] Trying to open a new channel from /172.31.35.147:0 to
/127.0.0.1:6001
2016-05-17 04:50:51.791+0000 DEBUG [o.n.k.h.MasterClient214]
MasterClient214 could not connect from /172.31.35.147:0 to /127.0.0.1:6001
2016-05-17 04:50:51.796+0000 INFO [o.n.k.h.MasterClient214]
MasterClient214[/127.0.0.1:6001] shutdown
2016-05-17 04:50:51.796+0000 ERROR
[o.n.k.h.c.m.HighAvailabilityModeSwitcher] Error while trying to switch to
slave MasterClient214 could not connect from /172.31.35.147:0 to
/127.0.0.1:6001
org.neo4j.com.ComException: MasterClient214 could not connect from
/172.31.35.147:0 to /127.0.0.1:6001
at org.neo4j.com.Client$2.create(Client.java:225)
at org.neo4j.com.Client$2.create(Client.java:202)
at org.neo4j.com.ResourcePool.acquire(ResourcePool.java:177)
at org.neo4j.com.Client.acquireChannelContext(Client.java:390)
at org.neo4j.com.Client.sendRequest(Client.java:296)
at org.neo4j.com.Client.sendRequest(Client.java:289)
at org.neo4j.kernel.ha.MasterClient210.copyStore(MasterClient210.java:311)
at
org.neo4j.kernel.ha.cluster.SwitchToSlave$1.copyStore(SwitchToSlave.java:531)
at
org.neo4j.com.storecopy.StoreCopyClient.copyStore(StoreCopyClient.java:191)
at
org.neo4j.kernel.ha.cluster.SwitchToSlave.copyStoreFromMaster(SwitchToSlave.java:525)
at
org.neo4j.kernel.ha.cluster.SwitchToSlave.copyStoreFromMasterIfNeeded(SwitchToSlave.java:348)
at
org.neo4j.kernel.ha.cluster.SwitchToSlave.switchToSlave(SwitchToSlave.java:272)
at
org.neo4j.kernel.ha.cluster.modeswitch.HighAvailabilityModeSwitcher$1.run(HighAvailabilityModeSwitcher.java:348)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
at org.neo4j.helpers.NamedThreadFactory$2.run(NamedThreadFactory.java:104)
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at
org.jboss.netty.channel.socket.nio.NioClientBoss.connect(NioClientBoss.java:148)
at
org.jboss.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:104)
at
org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:78)
at
org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
at
org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:41)
... 4 more
2016-05-17 04:50:51.797+0000 INFO
[o.n.k.h.c.m.HighAvailabilityModeSwitcher] Attempting to switch to slave
in 7s
2016-05-17 04:50:58.799+0000 INFO [o.n.k.i.f.CommunityFacadeFactory] No
locking implementation specified, defaulting to 'forseti'
2016-05-17 04:50:58.799+0000 INFO [o.n.k.h.c.SwitchToSlave] ServerId 2,
moving to slave for master ha://0.0.0.0:6001?serverId=1
2016-05-17 04:30:57.535+0000 DEBUG [o.n.c.p.c.ClusterState$2] [AsyncLog @
2016-05-17 04:30:57.534+0000] ClusterState:
discovery-[configurationTimeout]->discovery conversation-id:2/13#
payload:ConfigurationTimeoutState{remainingPings=3}
2016-05-17 04:30:57.535+0000 DEBUG [o.n.c.p.h.HeartbeatState$1] [AsyncLog @
2016-05-17 04:30:57.535+0000] HeartbeatState:
start-[reset_send_heartbeat]->start conversation-id:2/13#
2016-05-17 04:30:57.538+0000 INFO [o.n.c.c.NetworkSender] [AsyncLog @
2016-05-17 04:30:57.537+0000] Attempting to connect from /172.31.35.147:0
to /172.31.33.173:5001
2016-05-17 04:30:57.540+0000 INFO [o.n.c.c.NetworkSender] [AsyncLog @
2016-05-17 04:30:57.540+0000] Failed to connect to /172.31.33.173:5001 due
to: java.net.ConnectException: Connection refused
2016-05-17 04:30:57.540+0000 DEBUG [o.n.c.p.c.ClusterState$2] [AsyncLog @
2016-05-17 04:30:57.540+0000] ClusterState:
discovery-[configurationRequest]->discovery
from:cluster://172.31.35.147:5001 conversation-id:2/13#
payload:ConfigurationRequestState{joiningId=2,
joiningUri=cluster://172.31.35.147:5001}
2016-05-17 04:30:58.420+0000 INFO [o.n.c.c.NetworkReceiver] [AsyncLog @
2016-05-17 04:30:58.420+0000] cluster://172.31.35.147:47188 disconnected
from me at cluster://172.31.35.147:5001
2016-05-17 04:30:58.420+0000 INFO [o.n.c.c.NetworkReceiver] [AsyncLog @
2016-05-17 04:30:58.420+0000] cluster://172.31.35.147:47188 disconnected
from me at cluster://172.31.35.147:5001
2016-05-17 04:30:58.434+0000 INFO [o.n.k.i.t.l.c.CheckPointerImpl] Check
Pointing triggered by database shutdown [1]: Starting check pointing...
2016-05-17 04:30:58.438+0000 INFO [o.n.k.i.t.l.c.CheckPointerImpl] Check
Pointing triggered by database shutdown [1]: Starting store flush...
2016-05-17 04:30:58.443+0000 INFO [o.n.k.i.t.l.c.CheckPointerImpl] Check
Pointing triggered by database shutdown [1]: Store flush completed
2016-05-17 04:30:58.443+0000 INFO [o.n.k.i.t.l.c.CheckPointerImpl] Check
Pointing triggered by database shutdown [1]: Starting appending check
point entry into the tx log...
2016-05-17 04:30:58.447+0000 INFO [o.n.k.i.t.l.c.CheckPointerImpl] Check
Pointing triggered by database shutdown [1]: Appending check point entry
into the tx log completed
2016-05-17 04:30:58.447+0000 INFO [o.n.k.i.t.l.c.CheckPointerImpl] Check
Pointing triggered by database shutdown [1]: Check pointing completed
2016-05-17 04:30:58.447+0000 INFO [o.n.k.i.t.l.p.LogPruningImpl] Log
Rotation [0]: Starting log pruning.
2016-05-17 04:30:58.447+0000 INFO [o.n.k.i.t.l.p.LogPruningImpl] Log
Rotation [0]: Log pruning complete.
2016-05-17 04:30:58.475+0000 INFO [o.n.k.i.DiagnosticsManager] ---
STOPPING diagnostics START ---
2016-05-17 04:30:58.475+0000 INFO [o.n.k.i.DiagnosticsManager] High
Availability diagnostics
Member state:PENDING
State machines:
AtomicBroadcastMessage:start
AcceptorMessage:start
ProposerMessage:start
LearnerMessage:start
HeartbeatMessage:start
ElectionMessage:start
SnapshotMessage:start
ClusterMessage:discovery
Current timeouts:
join:configurationTimeout{conversation-id=2/13#, timeout-count=29,
created-by=2}
2016-05-17 04:30:58.475+0000 INFO [o.n.k.i.DiagnosticsManager] ---
STOPPING diagnostics END ---
2016-05-17 04:30:58.475+0000 INFO [o.n.k.h.f.HighlyAvailableFacadeFactory]
Shutdown started
etc.
Any insights are highly appreciated!!
Thank you!
Dennis
--
You received this message because you are subscribed to the Google Groups
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.