Wow. Yes, this looks a lot like the tmblue 9/10/16 event. I’ll be very 
interested to take a look at that patch.
        Haven’t forgotten the 2.2.6 patches I have in flight. Need to finish 
some other work and pull those off the back burner…
        Thanks,

        Tom    (


On 2/22/18, 6:06 PM, "Steve Singer" <st...@ssinger.info> wrote:

    On Thu, 22 Feb 2018, Tignor, Tom wrote:
    
    Looks like?
    http://lists.slony.info/pipermail/slony1-general/2016-September/013331.html
    
    I can't remember if that was what prompted 
    http://lists.slony.info/pipermail/slony1-hackers/2016-December/000560.html
    
    
    https://github.com/ssinger/slony1-engine/tree/bug375
    
    I can't seem to find a reason why this wasn't committed.
    
    
    > 
    >  
    > 
    >                 Hello slony1 community,
    > 
    >                 We have a head scratcher here. It appears a DROP NODE 
command was not fully processed. The
    > command was issued and confirmed on all our nodes at approximately 
2018-02-21 19:19:50 UTC. When we went to
    > restore it over two hours later, all replication stopped on an sl_event 
constraint violation. Investigation
    > showed a SYNC event for the dropped node with a timestamp of just a few 
seconds before the drop. I believe this
    > is a first for us. The DROP NODE command is supposed to remove all state 
for the dropped node. Is that right? Is
    > there a potential race condition somewhere which could leave behind state?
    > 
    >                 Thanks in advance,
    > 
    >  
    > 
    > ---- master log replication freeze error ----
    > 
    > 2018-02-21 21:38:52 UTC [5775] ERROR  remoteWorkerThread_8: "insert into 
"_ams_cluster".sl_event     (ev_origin,
    > ev_seqno, ev_timestamp,      ev_snapshot, ev\
    > 
    > _type     ) values ('8', '5002075962', '2018-02-21 19:19:41.958719+00', 
'87044110:87044110:', 'SYNC'); insert
    > into "_ams_cluster".sl_confirm       (con_origi\
    > 
    > n, con_received, con_seqno, con_timestamp)    values (8, 1, '5002075962', 
now()); select
    > "_ams_cluster".logApplySaveStats('_ams_cluster', 8, '0.139 s'::inter\
    > 
    > val); commit transaction;" PGRES_FATAL_ERROR ERROR:  duplicate key value 
violates unique constraint
    > "sl_event-pkey"
    > 
    > DETAIL:  Key (ev_origin, ev_seqno)=(8, 5002075962) already exists.
    > 
    > 2018-02-21 21:38:52 UTC [13649] CONFIG slon: child terminated signal: 9; 
pid: 5775, current worker pid: 5775
    > 
    > 2018-02-21 21:38:52 UTC [13649] CONFIG slon: restart of worker in 10 
seconds
    > 
    > ---- master log replication freeze error ----
    > 
    >  
    > 
    >  
    > 
    > ---- master DB leftover event ----
    > 
    > a...@ams6.cmb.netmgmt:~$ psql -U akamai -d ams
    > 
    > psql (9.1.24)
    > 
    > Type "help" for help.
    > 
    >  
    > 
    > ams=# select * from sl_event_bak;
    > 
    >  ev_origin |  ev_seqno  |         ev_timestamp          |    ev_snapshot  
   | ev_type | ev_data1 | ev_data2 |
    > ev_data3 | ev_data4 | ev_data5 | ev_data6 | ev_
    > 
    > data7 | ev_data8 
    > 
    > 
-----------+------------+-------------------------------+--------------------+---------+----------+----------+-
    > ---------+----------+----------+----------+----
    > 
    > ------+----------
    > 
    >          8 | 5002075962 | 2018-02-21 19:19:41.958719+00 | 
87044110:87044110: | SYNC    |          |          | 
    >         |          |          |          |    
    > 
    >       | 
    > 
    > (1 row)
    > 
    >  
    > 
    > ams=# 
    > 
    > ---- master DB leftover event ----
    > 
    >  
    > 
    > ---- master log drop node record ----
    > 
    > 2018-02-21 19:19:50 UTC [22582] CONFIG disableNode: no_id=8
    > 
    > 2018-02-21 19:19:50 UTC [22582] CONFIG storeListen: li_origin=4 
li_receiver=1 li_provider=4
    > 
    > 2018-02-21 19:19:50 UTC [22582] CONFIG storeListen: li_origin=7 
li_receiver=1 li_provider=7
    > 
    > 2018-02-21 19:19:50 UTC [22582] CONFIG storeListen: li_origin=3 
li_receiver=1 li_provider=3
    > 
    > 2018-02-21 19:19:50 UTC [22582] CONFIG remoteWorkerThread_4: update 
provider configuration
    > 
    > 2018-02-21 19:19:50 UTC [22582] CONFIG remoteWorkerThread_4: connection 
for provider 4 terminated
    > 
    > 2018-02-21 19:19:50 UTC [22582] CONFIG remoteWorkerThread_4: 
disconnecting from data provider 4
    > 
    > 2018-02-21 19:19:50 UTC [22582] CONFIG remoteWorkerThread_4: connection 
for provider 7 terminated
    > 
    > ---- master log drop node record ----
    > 
    >  
    > 
    > ---- replica log drop node record ----
    > 
    > 2018-02-21 19:19:51 UTC [22650] WARN   remoteWorkerThread_1: got DROP 
NODE for local node ID
    > 
    > NOTICE:  Slony-I: Please drop schema "_ams_cluster"
    > 
    > 2018-02-21 19:19:53 UTC [22650] INFO   remoteWorkerThread_7: SYNC 
5001868819 done in 2.153 seconds
    > 
    > NOTICE:  drop cascades to 243 other objects
    > 
    > DETAIL:  drop cascades to table _ams_cluster.sl_node
    > 
    > drop cascades to table _ams_cluster.sl_nodelock
    > 
    > drop cascades to table _ams_cluster.sl_set
    > 
    > drop cascades to table _ams_cluster.sl_setsync
    > 
    > drop cascades to table _ams_cluster.sl_table
    > 
    > drop cascades to table _ams_cluster.sl_sequence
    > 
    > ---- replica log drop node record ----
    > 
    >  
    > 
    >  
    > 
    >                 Tom    ☺
    > 
    >  
    > 
    >  
    > 
    > 
    >
    

_______________________________________________
Slony1-general mailing list
Slony1-general@lists.slony.info
http://lists.slony.info/mailman/listinfo/slony1-general

Reply via email to