Hi,

Please advise on required configuration for the 2-instance "HA" setup (AWS, 
neo4j Enterprise 3.0.1).

Currently I have on both instances:
- dbms.mode=HA
- ha.initial_hosts=172.31.35.147:5001,172.31.33.173:5001
- ha.host.coordination is commented out
- ha.host.data is commented out
Port 5001, 5002, 7474, 6001 open on both.

Differences
1. One node has ha.server_id=1 (172.31.33.173), another one - ha.server_id=2
2. Node with id=1 is Debian 8.4, id=2 is Centos 7


With this setup, node with id=1 starts w/o problems, elected as master, 
second one however fails.

Some log extracts:


2016-05-17 04:50:51.781+0000 INFO  [o.n.k.h.MasterClient214] 
MasterClient214 communication channel created towards /127.0.0.1:6001
2016-05-17 04:50:51.790+0000 INFO  [o.n.k.h.c.SwitchToSlave] Copying store 
from master
2016-05-17 04:50:51.791+0000 INFO  [o.n.k.h.MasterClient214] Thread[31, HA 
Mode switcher-1] Trying to open a new channel from /172.31.35.147:0 to 
/127.0.0.1:6001
2016-05-17 04:50:51.791+0000 DEBUG [o.n.k.h.MasterClient214] 
MasterClient214 could not connect from /172.31.35.147:0 to /127.0.0.1:6001
2016-05-17 04:50:51.796+0000 INFO  [o.n.k.h.MasterClient214] 
MasterClient214[/127.0.0.1:6001] shutdown
2016-05-17 04:50:51.796+0000 ERROR 
[o.n.k.h.c.m.HighAvailabilityModeSwitcher] Error while trying to switch to 
slave MasterClient214 could not connect from /172.31.35.147:0 to 
/127.0.0.1:6001
org.neo4j.com.ComException: MasterClient214 could not connect from 
/172.31.35.147:0 to /127.0.0.1:6001
at org.neo4j.com.Client$2.create(Client.java:225)
at org.neo4j.com.Client$2.create(Client.java:202)
at org.neo4j.com.ResourcePool.acquire(ResourcePool.java:177)
at org.neo4j.com.Client.acquireChannelContext(Client.java:390)
at org.neo4j.com.Client.sendRequest(Client.java:296)
at org.neo4j.com.Client.sendRequest(Client.java:289)
at org.neo4j.kernel.ha.MasterClient210.copyStore(MasterClient210.java:311)
at 
org.neo4j.kernel.ha.cluster.SwitchToSlave$1.copyStore(SwitchToSlave.java:531)
at 
org.neo4j.com.storecopy.StoreCopyClient.copyStore(StoreCopyClient.java:191)
at 
org.neo4j.kernel.ha.cluster.SwitchToSlave.copyStoreFromMaster(SwitchToSlave.java:525)
at 
org.neo4j.kernel.ha.cluster.SwitchToSlave.copyStoreFromMasterIfNeeded(SwitchToSlave.java:348)
at 
org.neo4j.kernel.ha.cluster.SwitchToSlave.switchToSlave(SwitchToSlave.java:272)
at 
org.neo4j.kernel.ha.cluster.modeswitch.HighAvailabilityModeSwitcher$1.run(HighAvailabilityModeSwitcher.java:348)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
at org.neo4j.helpers.NamedThreadFactory$2.run(NamedThreadFactory.java:104)
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at 
org.jboss.netty.channel.socket.nio.NioClientBoss.connect(NioClientBoss.java:148)
at 
org.jboss.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:104)
at 
org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:78)
at 
org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
at 
org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:41)
... 4 more
2016-05-17 04:50:51.797+0000 INFO 
 [o.n.k.h.c.m.HighAvailabilityModeSwitcher] Attempting to switch to slave 
in 7s
2016-05-17 04:50:58.799+0000 INFO  [o.n.k.i.f.CommunityFacadeFactory] No 
locking implementation specified, defaulting to 'forseti'
2016-05-17 04:50:58.799+0000 INFO  [o.n.k.h.c.SwitchToSlave] ServerId 2, 
moving to slave for master ha://0.0.0.0:6001?serverId=1



2016-05-17 04:30:57.535+0000 DEBUG [o.n.c.p.c.ClusterState$2] [AsyncLog @ 
2016-05-17 04:30:57.534+0000]  ClusterState: 
discovery-[configurationTimeout]->discovery conversation-id:2/13# 
payload:ConfigurationTimeoutState{remainingPings=3}
2016-05-17 04:30:57.535+0000 DEBUG [o.n.c.p.h.HeartbeatState$1] [AsyncLog @ 
2016-05-17 04:30:57.535+0000]  HeartbeatState: 
start-[reset_send_heartbeat]->start conversation-id:2/13#
2016-05-17 04:30:57.538+0000 INFO  [o.n.c.c.NetworkSender] [AsyncLog @ 
2016-05-17 04:30:57.537+0000]  Attempting to connect from /172.31.35.147:0 
to /172.31.33.173:5001
2016-05-17 04:30:57.540+0000 INFO  [o.n.c.c.NetworkSender] [AsyncLog @ 
2016-05-17 04:30:57.540+0000]  Failed to connect to /172.31.33.173:5001 due 
to: java.net.ConnectException: Connection refused
2016-05-17 04:30:57.540+0000 DEBUG [o.n.c.p.c.ClusterState$2] [AsyncLog @ 
2016-05-17 04:30:57.540+0000]  ClusterState: 
discovery-[configurationRequest]->discovery 
from:cluster://172.31.35.147:5001 conversation-id:2/13# 
payload:ConfigurationRequestState{joiningId=2, 
joiningUri=cluster://172.31.35.147:5001}
2016-05-17 04:30:58.420+0000 INFO  [o.n.c.c.NetworkReceiver] [AsyncLog @ 
2016-05-17 04:30:58.420+0000]  cluster://172.31.35.147:47188 disconnected 
from me at cluster://172.31.35.147:5001
2016-05-17 04:30:58.420+0000 INFO  [o.n.c.c.NetworkReceiver] [AsyncLog @ 
2016-05-17 04:30:58.420+0000]  cluster://172.31.35.147:47188 disconnected 
from me at cluster://172.31.35.147:5001
2016-05-17 04:30:58.434+0000 INFO  [o.n.k.i.t.l.c.CheckPointerImpl] Check 
Pointing triggered by database shutdown [1]:  Starting check pointing...
2016-05-17 04:30:58.438+0000 INFO  [o.n.k.i.t.l.c.CheckPointerImpl] Check 
Pointing triggered by database shutdown [1]:  Starting store flush...
2016-05-17 04:30:58.443+0000 INFO  [o.n.k.i.t.l.c.CheckPointerImpl] Check 
Pointing triggered by database shutdown [1]:  Store flush completed
2016-05-17 04:30:58.443+0000 INFO  [o.n.k.i.t.l.c.CheckPointerImpl] Check 
Pointing triggered by database shutdown [1]:  Starting appending check 
point entry into the tx log...
2016-05-17 04:30:58.447+0000 INFO  [o.n.k.i.t.l.c.CheckPointerImpl] Check 
Pointing triggered by database shutdown [1]:  Appending check point entry 
into the tx log completed
2016-05-17 04:30:58.447+0000 INFO  [o.n.k.i.t.l.c.CheckPointerImpl] Check 
Pointing triggered by database shutdown [1]:  Check pointing completed
2016-05-17 04:30:58.447+0000 INFO  [o.n.k.i.t.l.p.LogPruningImpl] Log 
Rotation [0]:  Starting log pruning.
2016-05-17 04:30:58.447+0000 INFO  [o.n.k.i.t.l.p.LogPruningImpl] Log 
Rotation [0]:  Log pruning complete.
2016-05-17 04:30:58.475+0000 INFO  [o.n.k.i.DiagnosticsManager] --- 
STOPPING diagnostics START ---
2016-05-17 04:30:58.475+0000 INFO  [o.n.k.i.DiagnosticsManager] High 
Availability diagnostics
Member state:PENDING
State machines:
   AtomicBroadcastMessage:start
   AcceptorMessage:start
   ProposerMessage:start
   LearnerMessage:start
   HeartbeatMessage:start
   ElectionMessage:start
   SnapshotMessage:start
   ClusterMessage:discovery
Current timeouts:
join:configurationTimeout{conversation-id=2/13#, timeout-count=29, 
created-by=2}
2016-05-17 04:30:58.475+0000 INFO  [o.n.k.i.DiagnosticsManager] --- 
STOPPING diagnostics END ---
2016-05-17 04:30:58.475+0000 INFO  [o.n.k.h.f.HighlyAvailableFacadeFactory] 
Shutdown started

etc. 


Any insights are highly appreciated!!

Thank you!
Dennis

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to