I think I have discovered more details regarding what happened.

After issuing DROP NODE, the master node (Node 5) was asked to restart, but it 
couldn't fully restart because of lock contention:

> Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[11953]: [6960-1] 2010-08-23 
> 10:52:31 PDT INFO   remoteWorkerThread_3: thread done
> Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[11953]: [6961-1] 2010-08-23 
> 10:52:31 PDT INFO   localListenThread: got restart notification
> Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[5541]: [5-1] 2010-08-23 
> 10:52:31 PDT INFO   slon: restart requested
> Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[5541]: [6-1] 2010-08-23 
> 10:52:31 PDT INFO   slon: notify worker process to shutdown
> Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[11953]: [6962-1] 2010-08-23 
> 10:52:31 PDT INFO   remoteListenThread_2: disconnecting from 'dbname=tii 
> host=shipper_db.xxx.xxxxxxx.com port=5432
> Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[11953]: [6962-2]  user=slony 
> password=xxxxxxx'
> Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[11953]: [6963-1] 2010-08-23 
> 10:52:31 PDT INFO   main: scheduler mainloop returned
> Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[11953]: [6964-1] 2010-08-23 
> 10:52:31 PDT CONFIG main: wait for remote threads
> Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[11953]: [6965-1] 2010-08-23 
> 10:52:31 PDT INFO   syncThread: thread done
> Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[11953]: [6966-1] 2010-08-23 
> 10:52:31 PDT DEBUG1 cleanupThread: thread done
> Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[11953]: [6967-1] 2010-08-23 
> 10:52:31 PDT DEBUG1 remoteListenThread_2: thread done
> Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[11953]: [6968-1] 2010-08-23 
> 10:52:31 PDT INFO   remoteListenThread_4: disconnecting from 'dbname=tii 
> host=db5.xxx.xxxxxxx.com port=5432
> Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[11953]: [6968-2]  user=slony 
> password=xxxxxxx'
> Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[11953]: [6969-1] 2010-08-23 
> 10:52:31 PDT DEBUG1 remoteListenThread_4: thread done
> Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[11953]: [6970-1] 2010-08-23 
> 10:52:31 PDT CONFIG remoteWorkerThread_4: update provider configuration
> Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[11953]: [6971-1] 2010-08-23 
> 10:52:31 PDT INFO   remoteWorkerThread_4: thread done
> Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[11953]: [6972-1] 2010-08-23 
> 10:52:31 PDT INFO   remoteListenThread_3: disconnecting from 'dbname=tii 
> host=db4.xxx.xxxxxxx.com port=5432
> Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[11953]: [6972-2]  user=slony 
> password=xxxxxxx'
> Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[11953]: [6973-1] 2010-08-23 
> 10:52:31 PDT DEBUG1 remoteListenThread_3: thread done
> Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[11953]: [6974-1] 2010-08-23 
> 10:52:31 PDT CONFIG remoteWorkerThread_2: update provider configuration
> Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[11953]: [6975-1] 2010-08-23 
> 10:52:31 PDT INFO   remoteWorkerThread_2: thread done
> Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[11953]: [6976-1] 2010-08-23 
> 10:52:31 PDT CONFIG main: done
> Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[5541]: [7-1] 2010-08-23 
> 10:52:31 PDT CONFIG slon: child terminated status: 0; pid: 11953, current 
> worker pid: 11953
> Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[5541]: [1-1] 2010-08-23 
> 10:52:31 PDT CONFIG main: slon version 2.0.3 starting up
> Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[5541]: [2-1] 2010-08-23 
> 10:52:31 PDT INFO   slon: watchdog process started
> Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[5541]: [3-1] 2010-08-23 
> 10:52:31 PDT CONFIG slon: watchdog ready - pid = 5541
> <snip (loading of config params)>
> Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[5541]: [4-1] 2010-08-23 
> 10:52:31 PDT CONFIG slon: worker process created - pid = 19013
> Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[19013]: [31-1] 2010-08-23 
> 10:52:31 PDT CONFIG main: local node id = 5
> Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[19013]: [32-1] 2010-08-23 
> 10:52:31 PDT INFO   main: main process started
> Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[19013]: [33-1] 2010-08-23 
> 10:52:31 PDT CONFIG main: launching sched_start_mainloop
> Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[19013]: [34-1] 2010-08-23 
> 10:52:31 PDT CONFIG main: loading current cluster configuration
> Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[19013]: [35-1] 2010-08-23 
> 10:52:31 PDT CONFIG storeNode: no_id=2 no_comment='Node 2 - Base tii - Host 
> shipper_db.xxx.xxxxxxx.com - Cluster
> Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[19013]: [35-2]  slony_schema'
> Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[19013]: [36-1] 2010-08-23 
> 10:52:31 PDT CONFIG storeNode: no_id=4 no_comment='Node 4 - Base tii - Host 
> db5.xxx.xxxxxxx.com - Cluster
> Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[19013]: [36-2]  slony_schema'
> Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[19013]: [37-1] 2010-08-23 
> 10:52:31 PDT CONFIG storeNode: no_id=6 no_comment='<event pending>'
> Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[19013]: [38-1] 2010-08-23 
> 10:52:31 PDT CONFIG storePath: pa_server=2 pa_client=5 
> pa_conninfo="dbname=tii host=shipper_db.xxx.xxxxxxx.com
> Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[19013]: [38-2]  port=5432 
> user=slony password=xxxxxxx" pa_connretry=10
> Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[19013]: [39-1] 2010-08-23 
> 10:52:31 PDT CONFIG storePath: pa_server=4 pa_client=5 
> pa_conninfo="dbname=tii host=db5.xxx.xxxxxxx.com
> Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[19013]: [39-2]  port=5432 
> user=slony password=xxxxxxx" pa_connretry=10
> Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[19013]: [40-1] 2010-08-23 
> 10:52:31 PDT CONFIG storeListen: li_origin=2 li_receiver=5 li_provider=2
> Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[19013]: [41-1] 2010-08-23 
> 10:52:31 PDT CONFIG storeListen: li_origin=4 li_receiver=5 li_provider=4
> Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[19013]: [42-1] 2010-08-23 
> 10:52:31 PDT CONFIG storeListen: li_origin=4 li_receiver=5 li_provider=2
> Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[19013]: [43-1] 2010-08-23 
> 10:52:31 PDT CONFIG storeListen: li_origin=2 li_receiver=5 li_provider=4
> Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[19013]: [44-1] 2010-08-23 
> 10:52:31 PDT CONFIG storeSet: set_id=1 set_origin=5 set_comment='Tables in 
> tii'
> Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[19013]: [45-1] 2010-08-23 
> 10:52:31 PDT CONFIG main: last local event sequence = 5037905941
> Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[19013]: [46-1] 2010-08-23 
> 10:52:31 PDT CONFIG main: configuration complete - starting threads
> Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[19013]: [47-1] 2010-08-23 
> 10:52:31 PDT INFO   localListenThread: thread starts
> Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[11946]: [3034324-1] 2010-08-23 
> 10:52:31 PDT DEBUG1 calc sync size - last time: 1 last length: 1000 ideal: 60 
> proposed size: 3
> Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[11946]: [3034325-1] 2010-08-23 
> 10:52:31 PDT DEBUG1 about to monitor_subscriber_query - pulling big actionid 
> list for 5
> Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[11946]: [3034326-1] 2010-08-23 
> 10:52:31 PDT INFO   remoteWorkerThread_5: syncing set 1 with 199 table(s) 
> from provider 5
> Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[19013]: [48-1] 2010-08-23 
> 10:52:31 PDT CONFIG version for "dbname=tii host=db6.xxx.xxxxxxx.com 
> port=5432 user=slony password=xxxxxxx"
> Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[19013]: [48-2]  is 80402
> Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[19013]: [49-1] 2010-08-23 
> 10:52:31 PDT DEBUG1 local_listen "dbname=tii host=db6.xxx.xxxxxxx.com 
> port=5432 user=slony
> Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[19013]: [49-2]  
> password=xxxxxxx": backend pid = 14119
> Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[19013]: [50-1] 2010-08-23 
> 10:52:31 PDT FATAL  localListenThread: "select 
> "_slony_schema".cleanupNodelock(); insert into "_slony_schema".sl_nodelock
> Aug 23 10:52:31 master-db.xxx.xxxxxxx.com slon[19013]: [50-2]  values (    5, 
> 0, "pg_catalog".pg_backend_pid()); " - ERROR:  duplicate key value violates 
> unique constraint "sl_nodelock-pkey"

Any ideas why we would reach this state?

Regards,
--Richard
_______________________________________________
Slony1-general mailing list
Slony1-general@lists.slony.info
http://lists.slony.info/mailman/listinfo/slony1-general

Reply via email to