I think I have discovered more details regarding what happened. After issuing DROP NODE, the master node (Node 5) was asked to restart, but it couldn't fully restart because of lock contention:
> Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[11953]: [6960-1] 2010-08-23 > 10:52:31 PDT INFO remoteWorkerThread_3: thread done > Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[11953]: [6961-1] 2010-08-23 > 10:52:31 PDT INFO localListenThread: got restart notification > Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[5541]: [5-1] 2010-08-23 > 10:52:31 PDT INFO slon: restart requested > Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[5541]: [6-1] 2010-08-23 > 10:52:31 PDT INFO slon: notify worker process to shutdown > Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[11953]: [6962-1] 2010-08-23 > 10:52:31 PDT INFO remoteListenThread_2: disconnecting from 'dbname=tii > host=shipper_db.xxx.xxxxxxx.com port=5432 > Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[11953]: [6962-2] user=slony > password=xxxxxxx' > Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[11953]: [6963-1] 2010-08-23 > 10:52:31 PDT INFO main: scheduler mainloop returned > Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[11953]: [6964-1] 2010-08-23 > 10:52:31 PDT CONFIG main: wait for remote threads > Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[11953]: [6965-1] 2010-08-23 > 10:52:31 PDT INFO syncThread: thread done > Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[11953]: [6966-1] 2010-08-23 > 10:52:31 PDT DEBUG1 cleanupThread: thread done > Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[11953]: [6967-1] 2010-08-23 > 10:52:31 PDT DEBUG1 remoteListenThread_2: thread done > Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[11953]: [6968-1] 2010-08-23 > 10:52:31 PDT INFO remoteListenThread_4: disconnecting from 'dbname=tii > host=db5.xxx.xxxxxxx.com port=5432 > Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[11953]: [6968-2] user=slony > password=xxxxxxx' > Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[11953]: [6969-1] 2010-08-23 > 10:52:31 PDT DEBUG1 remoteListenThread_4: thread done > Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[11953]: [6970-1] 2010-08-23 > 10:52:31 PDT CONFIG remoteWorkerThread_4: update provider configuration > Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[11953]: [6971-1] 2010-08-23 > 10:52:31 PDT INFO remoteWorkerThread_4: thread done > Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[11953]: [6972-1] 2010-08-23 > 10:52:31 PDT INFO remoteListenThread_3: disconnecting from 'dbname=tii > host=db4.xxx.xxxxxxx.com port=5432 > Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[11953]: [6972-2] user=slony > password=xxxxxxx' > Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[11953]: [6973-1] 2010-08-23 > 10:52:31 PDT DEBUG1 remoteListenThread_3: thread done > Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[11953]: [6974-1] 2010-08-23 > 10:52:31 PDT CONFIG remoteWorkerThread_2: update provider configuration > Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[11953]: [6975-1] 2010-08-23 > 10:52:31 PDT INFO remoteWorkerThread_2: thread done > Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[11953]: [6976-1] 2010-08-23 > 10:52:31 PDT CONFIG main: done > Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[5541]: [7-1] 2010-08-23 > 10:52:31 PDT CONFIG slon: child terminated status: 0; pid: 11953, current > worker pid: 11953 > Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[5541]: [1-1] 2010-08-23 > 10:52:31 PDT CONFIG main: slon version 2.0.3 starting up > Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[5541]: [2-1] 2010-08-23 > 10:52:31 PDT INFO slon: watchdog process started > Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[5541]: [3-1] 2010-08-23 > 10:52:31 PDT CONFIG slon: watchdog ready - pid = 5541 > <snip (loading of config params)> > Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[5541]: [4-1] 2010-08-23 > 10:52:31 PDT CONFIG slon: worker process created - pid = 19013 > Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[19013]: [31-1] 2010-08-23 > 10:52:31 PDT CONFIG main: local node id = 5 > Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[19013]: [32-1] 2010-08-23 > 10:52:31 PDT INFO main: main process started > Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[19013]: [33-1] 2010-08-23 > 10:52:31 PDT CONFIG main: launching sched_start_mainloop > Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[19013]: [34-1] 2010-08-23 > 10:52:31 PDT CONFIG main: loading current cluster configuration > Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[19013]: [35-1] 2010-08-23 > 10:52:31 PDT CONFIG storeNode: no_id=2 no_comment='Node 2 - Base tii - Host > shipper_db.xxx.xxxxxxx.com - Cluster > Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[19013]: [35-2] slony_schema' > Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[19013]: [36-1] 2010-08-23 > 10:52:31 PDT CONFIG storeNode: no_id=4 no_comment='Node 4 - Base tii - Host > db5.xxx.xxxxxxx.com - Cluster > Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[19013]: [36-2] slony_schema' > Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[19013]: [37-1] 2010-08-23 > 10:52:31 PDT CONFIG storeNode: no_id=6 no_comment='<event pending>' > Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[19013]: [38-1] 2010-08-23 > 10:52:31 PDT CONFIG storePath: pa_server=2 pa_client=5 > pa_conninfo="dbname=tii host=shipper_db.xxx.xxxxxxx.com > Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[19013]: [38-2] port=5432 > user=slony password=xxxxxxx" pa_connretry=10 > Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[19013]: [39-1] 2010-08-23 > 10:52:31 PDT CONFIG storePath: pa_server=4 pa_client=5 > pa_conninfo="dbname=tii host=db5.xxx.xxxxxxx.com > Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[19013]: [39-2] port=5432 > user=slony password=xxxxxxx" pa_connretry=10 > Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[19013]: [40-1] 2010-08-23 > 10:52:31 PDT CONFIG storeListen: li_origin=2 li_receiver=5 li_provider=2 > Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[19013]: [41-1] 2010-08-23 > 10:52:31 PDT CONFIG storeListen: li_origin=4 li_receiver=5 li_provider=4 > Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[19013]: [42-1] 2010-08-23 > 10:52:31 PDT CONFIG storeListen: li_origin=4 li_receiver=5 li_provider=2 > Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[19013]: [43-1] 2010-08-23 > 10:52:31 PDT CONFIG storeListen: li_origin=2 li_receiver=5 li_provider=4 > Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[19013]: [44-1] 2010-08-23 > 10:52:31 PDT CONFIG storeSet: set_id=1 set_origin=5 set_comment='Tables in > tii' > Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[19013]: [45-1] 2010-08-23 > 10:52:31 PDT CONFIG main: last local event sequence = 5037905941 > Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[19013]: [46-1] 2010-08-23 > 10:52:31 PDT CONFIG main: configuration complete - starting threads > Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[19013]: [47-1] 2010-08-23 > 10:52:31 PDT INFO localListenThread: thread starts > Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[11946]: [3034324-1] 2010-08-23 > 10:52:31 PDT DEBUG1 calc sync size - last time: 1 last length: 1000 ideal: 60 > proposed size: 3 > Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[11946]: [3034325-1] 2010-08-23 > 10:52:31 PDT DEBUG1 about to monitor_subscriber_query - pulling big actionid > list for 5 > Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[11946]: [3034326-1] 2010-08-23 > 10:52:31 PDT INFO remoteWorkerThread_5: syncing set 1 with 199 table(s) > from provider 5 > Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[19013]: [48-1] 2010-08-23 > 10:52:31 PDT CONFIG version for "dbname=tii host=db6.xxx.xxxxxxx.com > port=5432 user=slony password=xxxxxxx" > Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[19013]: [48-2] is 80402 > Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[19013]: [49-1] 2010-08-23 > 10:52:31 PDT DEBUG1 local_listen "dbname=tii host=db6.xxx.xxxxxxx.com > port=5432 user=slony > Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[19013]: [49-2] > password=xxxxxxx": backend pid = 14119 > Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[19013]: [50-1] 2010-08-23 > 10:52:31 PDT FATAL localListenThread: "select > "_slony_schema".cleanupNodelock(); insert into "_slony_schema".sl_nodelock > Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[19013]: [50-2] values ( 5, > 0, "pg_catalog".pg_backend_pid()); " - ERROR: duplicate key value violates > unique constraint "sl_nodelock-pkey" Any ideas why we would reach this state? Regards, --Richard _______________________________________________ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general