Hello Slony-I community,
            Hoping someone can advise on a strange and serious problem. We 
performed a slony service failover yesterday. For the first time ever, our 
slony service FAILOVER op errored out. We recently expanded our cluster to 7 
consumers from a single provider. There are no load issues during normal 
operations. As the error output below shows, though, our node 4 and node 5 
consumers never got the events they needed. Here’s where it gets weird: closer 
inspection has shown that node 2->4 and node 2->5 path data went missing out of 
the service at some point. It seems clear that’s the main issue, but in spite 
of that, both node 4 and node 5 continued to find and process node 2 SYNC 
events for a full week! The logs show this happened in spite of multiple 
restarts.
How can this happen? If missing path data stymies the failover, wouldn’t it 
also prevent normal SYNC processing?
In the case where a failover is begun with inadequate path data, what’s the 
best resolution? Can path data be quickly applied to allow failover to succeed?
            Thanks in advance for any insights.


---- failover error ----

/tmp/ams-tool/ams-slony1-fastfailover-1-FR_80.67.75.105.slk:56: NOTICE:  
calling restart node 1
/tmp/ams-tool/ams-slony1-fastfailover-1-FR_80.67.75.105.slk:55: 2017-06-26 
18:33:02
executing preFailover(1,1) on 2
executing preFailover(1,1) on 3
executing preFailover(1,1) on 4
executing preFailover(1,1) on 5
executing preFailover(1,1) on 6
executing preFailover(1,1) on 7
executing preFailover(1,1) on 8
NOTICE: executing "_ams_cluster".failedNode2 on node 2
/tmp/ams-tool/ams-slony1-fastfailover-1-FR_80.67.75.105.slk:56: waiting for 
event (2,5000061664).  node 8 only on event 5000061654, node 4 only on event 
5000061654, node 5 only on event 5000061655, node 3 only on event 5000061662, 
node 6\
 only on event 5000061654, node 7 only on event 5000061656
/tmp/ams-tool/ams-slony1-fastfailover-1-FR_80.67.75.105.slk:56: waiting for 
event (2,5000061664).  node 4 only on event 5000061657, node 5 only on event 
5000061663, node 3 only on event 5000061663, node 6 only on event 5000061663
/tmp/ams-tool/ams-slony1-fastfailover-1-FR_80.67.75.105.slk:56: waiting for 
event (2,5000061664).  node 4 only on event 5000061663, node 5 only on event 
5000061663, node 6 only on event 5000061663
/tmp/ams-tool/ams-slony1-fastfailover-1-FR_80.67.75.105.slk:56: waiting for 
event (2,5000061664).  node 4 only on event 5000061663, node 5 only on event 
5000061663
/tmp/ams-tool/ams-slony1-fastfailover-1-FR_80.67.75.105.slk:56: waiting for 
event (2,5000061664).  node 4 only on event 5000061663, node 5 only on event 
5000061663
/tmp/ams-tool/ams-slony1-fastfailover-1-FR_80.67.75.105.slk:56: waiting for 
event (2,5000061664).  node 4 only on event 5000061663, node 5 only on event 
5000061663
/tmp/ams-tool/ams-slony1-fastfailover-1-FR_80.67.75.105.slk:56: waiting for 
event (2,5000061664).  node 4 only on event 5000061663, node 5 only on event 
5000061663
/tmp/ams-tool/ams-slony1-fastfailover-1-FR_80.67.75.105.slk:56: waiting for 
event (2,5000061664).  node 4 only on event 5000061663, node 5 only on event 
5000061663
/tmp/ams-tool/ams-slony1-fastfailover-1-FR_80.67.75.105.slk:56: waiting for 
event (2,5000061664).  node 4 only on event 5000061663, node 5 only on event 
5000061663
/tmp/ams-tool/ams-slony1-fastfailover-1-FR_80.67.75.105.slk:56: waiting for 
event (2,5000061664).  node 4 only on event 5000061663, node 5 only on event 
5000061663
/tmp/ams-tool/ams-slony1-fastfailover-1-FR_80.67.75.105.slk:56: waiting for 
event (2,5000061664).  node 4 only on event 5000061663, node 5 only on event 
5000061663
/tmp/ams-tool/ams-slony1-fastfailover-1-FR_80.67.75.105.slk:56: waiting for 
event (2,5000061664).  node 4 only on event 5000061663, node 5 only on event 
5000061663
/tmp/ams-tool/ams-slony1-fastfailover-1-FR_80.67.75.105.slk:56: waiting for 
event (2,5000061664).  node 4 only on event 5000061663, node 5 only on event 
5000061663
/tmp/ams-tool/ams-slony1-fastfailover-1-FR_80.67.75.105.slk:56: waiting for 
event (2,5000061664).  node 4 only on event 5000061663, node 5 only on event 
5000061663


---- node 4 log archive ----

bos-mpt5c:odin-9353 ttignor$ egrep 'disableNode: no_id=2|storePath: pa_server=2 
pa_client=4|restart notification' prod4/node4-pathconfig.out
2017-06-15 15:14:00 UTC [5688] INFO   localListenThread: got restart 
notification
2017-06-15 15:14:10 UTC [8431] CONFIG storePath: pa_server=2 pa_client=4 
pa_conninfo="dbname=ams
2017-06-15 15:53:00 UTC [8431] INFO   localListenThread: got restart 
notification
2017-06-15 15:53:10 UTC [23701] CONFIG storePath: pa_server=2 pa_client=4 
pa_conninfo="dbname=ams
2017-06-16 17:29:13 UTC [10253] CONFIG storePath: pa_server=2 pa_client=4 
pa_conninfo="dbname=ams
2017-06-16 20:43:42 UTC [2707] CONFIG storePath: pa_server=2 pa_client=4 
pa_conninfo="dbname=ams
2017-06-19 15:11:45 UTC [2707] CONFIG disableNode: no_id=2
2017-06-19 15:11:45 UTC [2707] INFO   localListenThread: got restart 
notification
2017-06-20 18:40:15 UTC [31224] INFO   localListenThread: got restart 
notification
2017-06-21 14:31:42 UTC [6253] INFO   localListenThread: got restart 
notification
2017-06-21 14:35:26 UTC [32367] INFO   localListenThread: got restart 
notification
2017-06-26 18:21:25 UTC [9278] INFO   localListenThread: got restart 
notification
2017-06-26 18:33:04 UTC [28839] INFO   localListenThread: got restart 
notification
2017-06-26 18:33:30 UTC [1785] INFO   localListenThread: got restart 
notification
bos-mpt5c:odin-9353 ttignor$


---- node 5 log archive ----

bos-mpt5c:odin-9353 ttignor$ egrep 'disableNode: no_id=2|storePath: pa_server=2 
pa_client=5|restart notification' prod5/node5-pathconfig.out
2017-06-15 15:13:56 UTC [20700] INFO   localListenThread: got restart 
notification
2017-06-15 15:14:06 UTC [20374] CONFIG storePath: pa_server=2 pa_client=5 
pa_conninfo="dbname=ams
2017-06-15 15:53:01 UTC [20374] INFO   localListenThread: got restart 
notification
2017-06-15 15:53:11 UTC [2859] CONFIG storePath: pa_server=2 pa_client=5 
pa_conninfo="dbname=ams
2017-06-16 17:28:19 UTC [2859] INFO   localListenThread: got restart 
notification
2017-06-16 17:28:29 UTC [10753] CONFIG storePath: pa_server=2 pa_client=5 
pa_conninfo="dbname=ams
2017-06-19 15:11:40 UTC [10753] CONFIG disableNode: no_id=2
2017-06-19 15:11:40 UTC [10753] INFO   localListenThread: got restart 
notification
2017-06-20 18:40:11 UTC [450] INFO   localListenThread: got restart notification
2017-06-21 14:31:41 UTC [22300] INFO   localListenThread: got restart 
notification
2017-06-21 14:35:28 UTC [26777] INFO   localListenThread: got restart 
notification
2017-06-26 18:21:27 UTC [28366] INFO   localListenThread: got restart 
notification
2017-06-26 18:33:04 UTC [29345] INFO   localListenThread: got restart 
notification
2017-06-26 18:33:27 UTC [1299] INFO   localListenThread: got restart 
notification
bos-mpt5c:odin-9353 ttignor$


            Tom    ☺


_______________________________________________
Slony1-general mailing list
Slony1-general@lists.slony.info
http://lists.slony.info/mailman/listinfo/slony1-general

Reply via email to