Hello slony1 community, We have a head scratcher here. It appears a DROP NODE command was not fully processed. The command was issued and confirmed on all our nodes at approximately 2018-02-21 19:19:50 UTC. When we went to restore it over two hours later, all replication stopped on an sl_event constraint violation. Investigation showed a SYNC event for the dropped node with a timestamp of just a few seconds before the drop. I believe this is a first for us. The DROP NODE command is supposed to remove all state for the dropped node. Is that right? Is there a potential race condition somewhere which could leave behind state? Thanks in advance,
---- master log replication freeze error ---- 2018-02-21 21:38:52 UTC [5775] ERROR remoteWorkerThread_8: "insert into "_ams_cluster".sl_event (ev_origin, ev_seqno, ev_timestamp, ev_snapshot, ev\ _type ) values ('8', '5002075962', '2018-02-21 19:19:41.958719+00', '87044110:87044110:', 'SYNC'); insert into "_ams_cluster".sl_confirm (con_origi\ n, con_received, con_seqno, con_timestamp) values (8, 1, '5002075962', now()); select "_ams_cluster".logApplySaveStats('_ams_cluster', 8, '0.139 s'::inter\ val); commit transaction;" PGRES_FATAL_ERROR ERROR: duplicate key value violates unique constraint "sl_event-pkey" DETAIL: Key (ev_origin, ev_seqno)=(8, 5002075962) already exists. 2018-02-21 21:38:52 UTC [13649] CONFIG slon: child terminated signal: 9; pid: 5775, current worker pid: 5775 2018-02-21 21:38:52 UTC [13649] CONFIG slon: restart of worker in 10 seconds ---- master log replication freeze error ---- ---- master DB leftover event ---- a...@ams6.cmb.netmgmt:~$ psql -U akamai -d ams psql (9.1.24) Type "help" for help. ams=# select * from sl_event_bak; ev_origin | ev_seqno | ev_timestamp | ev_snapshot | ev_type | ev_data1 | ev_data2 | ev_data3 | ev_data4 | ev_data5 | ev_data6 | ev_ data7 | ev_data8 -----------+------------+-------------------------------+--------------------+---------+----------+----------+----------+----------+----------+----------+---- ------+---------- 8 | 5002075962 | 2018-02-21 19:19:41.958719+00 | 87044110:87044110: | SYNC | | | | | | | | (1 row) ams=# ---- master DB leftover event ---- ---- master log drop node record ---- 2018-02-21 19:19:50 UTC [22582] CONFIG disableNode: no_id=8 2018-02-21 19:19:50 UTC [22582] CONFIG storeListen: li_origin=4 li_receiver=1 li_provider=4 2018-02-21 19:19:50 UTC [22582] CONFIG storeListen: li_origin=7 li_receiver=1 li_provider=7 2018-02-21 19:19:50 UTC [22582] CONFIG storeListen: li_origin=3 li_receiver=1 li_provider=3 2018-02-21 19:19:50 UTC [22582] CONFIG remoteWorkerThread_4: update provider configuration 2018-02-21 19:19:50 UTC [22582] CONFIG remoteWorkerThread_4: connection for provider 4 terminated 2018-02-21 19:19:50 UTC [22582] CONFIG remoteWorkerThread_4: disconnecting from data provider 4 2018-02-21 19:19:50 UTC [22582] CONFIG remoteWorkerThread_4: connection for provider 7 terminated ---- master log drop node record ---- ---- replica log drop node record ---- 2018-02-21 19:19:51 UTC [22650] WARN remoteWorkerThread_1: got DROP NODE for local node ID NOTICE: Slony-I: Please drop schema "_ams_cluster" 2018-02-21 19:19:53 UTC [22650] INFO remoteWorkerThread_7: SYNC 5001868819 done in 2.153 seconds NOTICE: drop cascades to 243 other objects DETAIL: drop cascades to table _ams_cluster.sl_node drop cascades to table _ams_cluster.sl_nodelock drop cascades to table _ams_cluster.sl_set drop cascades to table _ams_cluster.sl_setsync drop cascades to table _ams_cluster.sl_table drop cascades to table _ams_cluster.sl_sequence ---- replica log drop node record ---- Tom ☺
_______________________________________________ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general