Re: [Slony1-general] Slony website is down
On 5/13/24 03:08, Devrim Gündüz via Slony1-general wrote: Hi, On Sun, 2024-05-12 at 18:09 +, Soni M via Slony1-general wrote: I found that slony website is down. https://www.slony.info/pipermail/slony1-general/2024-March/013559.html It responds (sluggish) to me. That said the Slony-I project is ending. We will not provide any new versions. Regards, Jan Regards, ___ Slony1-general mailing list Slony1-general@lists.slony.info https://lists.slony.info/cgi-bin/mailman/listinfo/slony1-general ___ Slony1-general mailing list Slony1-general@lists.slony.info https://lists.slony.info/cgi-bin/mailman/listinfo/slony1-general
Re: [Slony1-general] Replication interruption
On 10/22/21 3:33 PM, Sung Hsin Lei via Slony1-general wrote: This question may have been asked already. I have noticed that if it takes 10 days for the initial data transfer to complete, if the transfer is interrupted after 9 days, the transfer resumes from day 1. In other words, I need 10 days of uninterrupted connection between the master and slave machine for replication to be established properly. Is this the case or did I mess up somewhere? This is correct. One "set" (of tables) must copy over in one transaction to have a consistent snapshot for replication to start catching up. In other systems this is often called "CDC" or "Change Data Capture". Without the initial copy being done in one transaction, the system has no way of knowing what incremental changes have to be replicated. You may be able to break up the whole process into multiple steps by creating multiple sets. All the small (usually referenced PK tables) first in one set. Then create a set for one or few tables at a time and replicate them, after which you would do a MERGE SET. Rinse, repeat. Another question. On a new replication(empty tables on the slave), I see the initial transfer stuck on TRUNCATE for a long time for big tables. Sometimes, it truncates the same table several times. I remember getting an explanation for why it is so. Is there a way to make the initial transfer faster(avoiding the truncate)? A TRUNCATE taking a long time on an empty table sounds wrong. Are you sure this table is empty or do you assume it is logically empty because a previous COPY has failed but the system is actually making sure that none of its foreign key references are violated by the TRUNCATE operation? -- Jan Wieck ___ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general
[Slony1-general] Test (01) moving
Please ignore -- Jan Wieck Principal Database Engineer ___ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general
Re: [Slony1-general] Looks like someone's spam tool is buggy
I am right now seeing up to 6 bugs opened per minute. I suggest we shut bugzilla down until we can figure this out. On Wed, Jan 4, 2017 at 1:04 PM, Christopher Browne <cbbro...@gmail.com> wrote: > Oh dear. I closed several hundred of them yesterday, with the faint > hope that it was a one-off situation; apparently something is running > on an ongoing basis, generating bugs consisting of "material of spammy > consistency" on the Slony Bugzilla instance. > > Joshua Drake has been enquring as to whether or not we should see > about updating some things (he's been thinking of the OS and > PostgreSQL) on slony.info; I suspect we need to do something regarding > the version of Bugzilla, with the dose of wishful thinking that > perhaps a later version would be more resistant to this spamulous > activity. > > I suggest weighing in on Joshua's comments; we should muse a bit on > what to do about all this. > -- Jan Wieck Senior Postgres Architect http://pgblog.wi3ck.info ___ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general
[Slony1-general] Looks like someone's spam tool is buggy
Not sure how we can help to fix it though. Jan -- Jan Wieck Senior Postgres Architect http://pgblog.wi3ck.info ___ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general
Re: [Slony1-general] Slony Tuning, heavy slon backup every AM
Getting a 404 on those images. On Tue, Aug 9, 2016 at 12:36 PM, Tory M Blue <tmb...@gmail.com> wrote: > > > On Sat, Aug 6, 2016 at 6:16 AM, Jan Wieck <j...@wi3ck.info> wrote: > >> How are you monitoring the number of rows in the sl_log_* tables? >> >> >> Jan >> >> > I've got a couple of updates but first let me answer the question. > > max => "SHOW max_connections", > > cur => "SELECT COUNT(*) FROM pg_stat_activity", > > log1=> "SELECT COUNT(*) FROM _cls.sl_log_1", > > log2=> "SELECT COUNT(*) FROM _cls.sl_log_2", > > siz1=> "SELECT (relpages*8) FROM pg_class where relname='sl_log > _1'", > siz2=> "SELECT (relpages*8) FROM pg_class where relname='sl_log > _2'" > > We have a script that runs and monitors a few things in the dB, one is the > count of sl_log_1 and 2. This is added as an RRD > and graphed. > > Now the update, graph above , is what we have been seeing, logs grow up > until 10:50am or so when the truncate is allowed (now backups are complete > at 4am and all heavy lifting is finished at 5am) > > > [image: idb01.gc - slonyTables] > > > I disabled the full backup on the standby unit last night. > [image: idb01.gc - slonyTables] > > > I still get some peeks and valleys, but the system is not backed up for 10 > hours. So this tells me that the backup on the standby is causing a > situation where slon is blocked because tables are locked on the standby > unit. This is still not ideal but it's a big improvement from before where > the sl_log_? would grow to 14million rows from about 2am to 10:45am.. > > So I need to figure out how to reduce the overhead of the backup, I > figured it was better to run it on the standby but at this point that is > not looking like a great option.. > > Thanks for working on this with me! > > Tory > -- Jan Wieck Senior Postgres Architect http://pgblog.wi3ck.info ___ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general
Re: [Slony1-general] Slony Tuning, heavy slon backup every AM
How are you monitoring the number of rows in the sl_log_* tables? Jan On Thu, Aug 4, 2016 at 3:04 PM, Tory M Blue <tmb...@gmail.com> wrote: > > > On Thu, Aug 4, 2016 at 12:02 PM, Tory M Blue <tmb...@gmail.com> wrote: > >> >> >> On Thu, Aug 4, 2016 at 10:29 AM, Tory M Blue <tmb...@gmail.com> wrote: >> > On Thu, Aug 4, 2016 at 8:39 AM, Jan Wieck <j...@wi3ck.info> wrote: >> >> Another problem we recently ran into is that after bulk operations, the >> >> queries selecting the log data on the data provider can degrade. A >> >> workaround >> >> in one case was to ALTER the slony user and set work_mem to 512MB. The >> >> work_mem of 100MB, used by the application, was causing the scans on >> >> the sl_log table to become sequential scans. >> >> >> >> The problem is amplified if the tables are spread across many sets. >> Merging >> >> the sets into few larger ones causes fewer scans. >> >> >> >> >> >> Regards, Jan >> >> >> > >> > Unfortunately, I've never been able to migrate to a slony user, so >> > slony runs as postgres and my setting are >> > >> > work_mem = 2GB >> > >> > maintenance_work_mem = 2GB >> > >> > So don't think I can test this theory. >> > >> > Also of note, we have 3 sets >> > >> > Same today, just don't see any long running queries where slony has no >> > time to truncate the table, so I'm not clear what is preventing the >> > slony table from being truncated. ( it really sounds like something >> > that Jan has run into before, just a huge table and for some reason >> > the truncate can't complete before another job or request comes in? >> > Even now the slon table is over 12Million and I expect it to be able >> > to truncate/clear in the next 30 minutes or so, but why.. the big bulk >> > operation finishes at 5am, not sure why it takes another almost 6 >> > hours to truncate that table.. >> > >> > Thanks again! >> > Tory >> >> Also interesting today, is that it's getting worse :) but the same >> pattern, other then it's taking longer and today I saw something really >> weird. >> >> slon ran a truncate but only cleared 50% of the sl_log1, How is that >> possible, I mean a truncate is a truncate, I have no idea how this table >> went from 14M to 7M >> >> 2016-08-04 10:43:03 PDT clsdb postgres 10.13.200.231(37864) 58874 >> 2016-08-04 10:43:03.237 PDTNOTICE: Slony-I: could not lock sl_log_2 - >> sl_log_2 not truncated >> >> 2016-08-04 10:53:43 PDT clsdb postgres 10.13.200.231(37864) 58874 >> 2016-08-04 10:53:43.573 PDTNOTICE: Slony-I: log switch to sl_log_1 >> complete - truncate sl_log_2 >> >> 2016-08-04 11:05:41 PDT clsdb postgres 10.13.200.231(37864) 58874 >> 2016-08-04 11:05:41.750 PDTNOTICE: Slony-I: Logswitch to sl_log_2 initiated >> >> *2016-08-04 11:16:54 PDT clsdb postgres 10.13.200.231(37864) 58874 >> 2016-08-04 11:16:54.783 PDTNOTICE: Slony-I: log switch to sl_log_2 >> complete - truncate sl_log_1 <--- didn't actually "truncate" the table, >> there is still 7.5M rows and yet it's moving ahead with logswitch to >> sl_log_1 (which again was just supposedly truncated, had 14 million rows >> but has 7.5 Million rows still)..* >> >> 2016-08-04 11:29:06 PDT clsdb postgres 10.13.200.231(37864) 58874 >> 2016-08-04 11:29:06.332 PDTNOTICE: Slony-I: Logswitch to sl_log_1 initiated >> >> 2016-08-04 11:40:43 PDT clsdb postgres 10.13.200.231(37864) 58874 >> 2016-08-04 11:40:43.048 PDTNOTICE: Slony-I: log switch to sl_log_1 >> complete - truncate sl_log_2 >> >> 2016-08-04 11:51:44 PDT clsdb postgres 10.13.200.231(37864) 58874 >> 2016-08-04 11:51:44.660 PDTNOTICE: Slony-I: Logswitch to sl_log_2 initiated >> >> So what I'm also noticing and maybe not related, but my standby node has >> lots of locks and long running reports, I'm wondering if it's causing some >> of the backup seen on the master. So many questions still :)) >> >> Thanks for listening and being my wall to bounce stuff off of >> >> Tory >> >> Sent to soon > > 2016-08-04 12:02:50 PDT clsdb postgres 10.13.200.231(37864) 58874 > 2016-08-04 12:02:50.556 PDTNOTICE: Slony-I: log switch to sl_log_2 > complete - truncate sl_log_1 > *(This truncate actually removed the 7.5 Million remaining rows), * > > *I'm confused how can a truncate leave rows?!* > > > Tory > > ___ > Slony1-general mailing list > Slony1-general@lists.slony.info > http://lists.slony.info/mailman/listinfo/slony1-general > > -- Jan Wieck Senior Postgres Architect http://pgblog.wi3ck.info ___ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general
Re: [Slony1-general] Slony Tuning, heavy slon backup every AM
Another problem we recently ran into is that after bulk operations, the queries selecting the log data on the data provider can degrade. A workaround in one case was to ALTER the slony user and set work_mem to 512MB. The work_mem of 100MB, used by the application, was causing the scans on the sl_log table to become sequential scans. The problem is amplified if the tables are spread across many sets. Merging the sets into few larger ones causes fewer scans. Regards, Jan On Thu, Aug 4, 2016 at 7:52 AM, Glyn Astill <glynast...@yahoo.co.uk> wrote: > > > From: Tory M Blue <tmb...@gmail.com> > >To: Jan Wieck <j...@wi3ck.info> > >Cc: slony <slony1-general@lists.slony.info> > >Sent: Wednesday, 3 August 2016, 21:45 > >Subject: Re: [Slony1-general] Slony Tuning, heavy slon backup every AM > > > > > > > > > > > > > > > >On Wed, Aug 3, 2016 at 12:30 PM, Jan Wieck <j...@wi3ck.info> wrote: > > > > > >> > >> > >> > >>On Wed, Aug 3, 2016 at 11:14 AM, Tory M Blue <tmb...@gmail.com> wrote: > >> > >>Hey folks > >>> > >>>Running into some more issues and i'm not finding a definitive tuning > guide nor can I tell why i'm backing up and why it takes so long to catch > up (so long is relative). > >>> > >>> > >>>Running Slony 2.2.3 (but this has been an issue for a long time). > >>> > >>> > >>>Image to show you where I am this AM , but this is a constant every > AM. I have large updates about 2.5 million between 12am and 4am, the job > is done but it will take Slon a couple of hours to catch up. > >> > >> > >>Are there long running transactions that start around the time, the > sl_log starts > >>growing? Any long running transaction prevents a log switch from > finishing. > >> > >> > >> > >> > >>Jan > > > > > >Hi Jan, > > > > > >There are really not many long queries during the big import, but things > start getting longer when the slon logs have 10m+ rows in them. > > > > > >I actually log any transaction that takes over a second and there is > nothing that stands out, nothing that runs all that long. During this > period, I move my reporting to point to the master, because of the delay in > the standby, so there are some longer report queries, but the longest is 16 > minutes > > > > > Do you see any long running copy statements from the slon on the > subscriber? If so one avenue to investigate is if there are any replicated > tables with sub-optimal indexes on them. > > I mention it purely because I've been bitten by this myself and seen very > similar symptoms to what you're reporting here. There's every chance this > is way off the mark, but I think it's worth mentioning. > > The issue I saw was down to a bunch of pretty ugly indexes created by an > application, and the result was inaccurate selectivity estimates causing > the planner to choose one of the ugly indexes instead of the primary key > when applying the changes on the subscriber. Multivariate planner > statistics that PostgreSQL doesn't yet have might have mitigated this, but > that's besides the point. > > > For example I'd see something like 200,000 records getting updated by a > single update statement on the origin taking about 20 seconds, but when > applied as 200,000 separate update statements on the subscriber it took > more like 200 minutes. An update using an index scan on the ugly index > would take 62ms vs 0.1ms on the primary key, but outwardly all I saw until > I enabled autoexplain was a long running copy. > > Glyn > -- Jan Wieck Senior Postgres Architect http://pgblog.wi3ck.info ___ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general
Re: [Slony1-general] Slony Tuning, heavy slon backup every AM
On Wed, Aug 3, 2016 at 11:14 AM, Tory M Blue <tmb...@gmail.com> wrote: > Hey folks > > Running into some more issues and i'm not finding a definitive tuning > guide nor can I tell why i'm backing up and why it takes so long to catch > up (so long is relative). > > Running Slony 2.2.3 (but this has been an issue for a long time). > > Image to show you where I am this AM , but this is a constant every AM. I > have large updates about 2.5 million between 12am and 4am, the job is done > but it will take Slon a couple of hours to catch up. > Are there long running transactions that start around the time, the sl_log starts growing? Any long running transaction prevents a log switch from finishing. Jan > > Hardware: big 256GB 32 Proc machines (proc's don't matter here). I see > very little iowait 2% on the standby, 0% on the primary and I'm seeing a > whopping 150tps coming to the secondary (I would expect it to be writing > like crazy!) > > idb01 is the primary idb02 is the secondary > > > > MY configuration settings, or those that I think matter > > cleanup_interval="5 minutes" > > sync_interval=1000 > > sync_group_maxsize=5000 > > #sync_max_rowsize=8192 > > > I don't see why my primary backs up so much, I don't see the same backup > on idb02 (which is replicating to 3 other servers. Those 3 servers have > forward=no, so they never have anything stored in sl_log. (meaning the > graphs for those are flat lined). > > I think the only thing to really tune is sync_group_maxsize, or > sync_max_rowsize maybe. I have lots of RAM, but would really like to figure > out why I seem to backup, the hardware, network, disk is good, Slony should > really be performing better, it has lots of resources.. > > > Thanks for any ideas or where to look. Logs don't seem to tell me there is > an issue. > > -Tory > > > > _______ > Slony1-general mailing list > Slony1-general@lists.slony.info > http://lists.slony.info/mailman/listinfo/slony1-general > > -- Jan Wieck Senior Postgres Architect http://pgblog.wi3ck.info ___ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general
Re: [Slony1-general] Replication Lag?
On 02/29/2016 10:49 AM, Rob Brucks wrote: > 1. Yes, the sl_confirm data is showing up on the subscriber. > > 2. No, the origin node is not getting back the sl_confirm data from the > active subscriber. Does the origin node log any errors that it cannot connect to that subscriber node? Jan > > > Thanks, > Rob > > > > On 2/26/16, 8:38 PM, "Steve Singer" <st...@ssinger.info> wrote: > >>On Fri, 26 Feb 2016, Rob Brucks wrote: >> >>> >>> But, if I insert some test data into the master DB, I see the data show up >>> on the remaining active slave. So replication to the remaining slave DB >>> is obviously working. >> >>Replication with slony has two parts >> >>1. Does the data replicate from the origin to the subscriber. When this >>happens a row is added to the subscriber's sl_event table with >>ev_origin=$origin_node and a confirm is added to the subscribers sl_confirm >>table with con_origin=$origin_node and con_received=$subscriber_node >> >>2. The sl_confirm row mentioned above needs to then get picked up by the >>slon for the origin node and brought back from the subscriber to the origin. >> >>Are your confirms making it back? >> >> >>> >>> We use sl_status to monitor replication so we need it to accurately report >>> lag if there's an issue. The Slony 1.2 version we used before did not >>> behave this way, it accurately reported which slave was not replicating. >>> >>> Why does sl_status report lag on the active slave even though replication >>> appears to be working fine? >>> >>> Do I have a misconfiguration somewhere? >>> >>> Thanks, >>> Rob >>> >>> >>> Here's my slony config: >>> >>> >>> CLUSTER NAME = slony; >>> NODE 1 ADMIN CONNINFO = 'dbname=test_db host=/tmp port=5432 >>> user=slony'; >>> NODE 2 ADMIN CONNINFO = 'dbname=test_db host=/tmp port=5433 >>> user=slony'; >>> NODE 3 ADMIN CONNINFO = 'dbname=test_db host=/tmp port=5434 >>> user=slony'; >>> >>> CLUSTERS >>> >>> INIT CLUSTER (ID = 1, COMMENT = 'Master'); >>> >>> >>> NODES >>> >>> STORE NODE (ID = 2, COMMENT = 'Slave1', EVENT NODE = 1); >>> STORE NODE (ID = 3, COMMENT = 'Slave2', EVENT NODE = 1); >>> >>> >>> PATHS >>> >>> STORE PATH (SERVER = 1, CLIENT = 2, CONNINFO = 'dbname=test_db >>> host=/tmp port=5432 user=slony'); >>> STORE PATH (SERVER = 1, CLIENT = 3, CONNINFO = 'dbname=test_db >>> host=/tmp port=5432 user=slony'); >>> STORE PATH (SERVER = 2, CLIENT = 1, CONNINFO = 'dbname=test_db >>> host=/tmp port=5433 user=slony'); >>> STORE PATH (SERVER = 2, CLIENT = 3, CONNINFO = 'dbname=test_db >>> host=/tmp port=5433 user=slony'); >>> STORE PATH (SERVER = 3, CLIENT = 1, CONNINFO = 'dbname=test_db >>> host=/tmp port=5434 user=slony'); >>> STORE PATH (SERVER = 3, CLIENT = 2, CONNINFO = 'dbname=test_db >>> host=/tmp port=5434 user=slony'); >>> >>> >>> SETS >>> >>> CREATE SET (ID = 1, ORIGIN = 1, COMMENT = 'TEST Set 1'); >>> >>> SEQUENCES >>> >>> SET ADD SEQUENCE (SET ID = 1, ORIGIN = 1, ID = 1, FULLY QUALIFIED NAME >>> = '"public"."test_seq"'); >>> >>> TABLES >>> >>> SET ADD TABLE (SET ID = 1, ORIGIN = 1, ID = 2, FULLY QUALIFIED NAME = >>> '"public"."test"'); >>> >>> SUBSCRIPTIONS >>> >>> SUBSCRIBE SET (ID = 1, PROVIDER = 1, RECEIVER = 2, FORWARD = YES); >>> SUBSCRIBE SET (ID = 1, PROVIDER = 1, RECEIVER = 3, FORWARD = YES); >>> >>> >>> ___ >>> Slony1-general mailing list >>> Slony1-general@lists.slony.info >>> http://lists.slony.info/mailman/listinfo/slony1-general >>> >> > ___ > Slony1-general mailing list > Slony1-general@lists.slony.info > http://lists.slony.info/mailman/listinfo/slony1-general > -- Jan Wieck Database Architect ___ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general
Re: [Slony1-general] Slony-I: log switch to sl_log_2 still in progress - sl_log_1 not truncated
On 02/02/2016 08:06 AM, Tignor, Tom wrote: > > I did drop one of my replicas several weeks ago, though I didn’t > recreate > the node. I do have automation to do exactly that, however, if a replica > becomes defective somehow. Seems I’ll need to consider the point of the > two log switches. Is that important even if the dropped node isn’t a > provider for anybody? Yes, it is important because even a non-forwarding leaf node is still producing events that propagate to all other nodes. You might want to check sl_even on all nodes if there are remnants of dropped and re-created nodes there. This would be sl_event rows with an ev_seqno in the future of what that node is currently producing. Regards, Jan > > Tom:-) > > > On 2/2/16, 12:06 AM, "Jan Wieck" <j...@wi3ck.info> wrote: > >>On 02/01/2016 01:24 PM, Tignor, Tom wrote: >>> >>> Quick update: a couple hours after deleting entries from both sl_log >>> tables with txids > 630M, it appears the cleanup thread has taken care >>>of >>> business. sl_log_1 is down from 54GB to 24KB. >> >>I still wonder how that happened in the first place. >> >>Was there a sequence of dropping and re-creating a node recently? I >>think I have seen cases like this where if a node is re-created with the >>same node ID before the cleanup of data, belonging to the dropped node, >>had happened everywhere. The "cleanup" I am talking about is basically 2 >>complete log switches on all nodes after the DROP NODE had replicated >>everywhere. That takes at least 20-30 minutes and can in some cases take >>hours. >> >> >>Regards, Jan >> >> >>> >>> Tom:-) >>> >>> >>> On 2/1/16, 9:06 AM, "Tignor, Tom" <ttig...@akamai.com> wrote: >>> >>>> >>>>Jan, >>>>Thanks much for all the help. I’ve been looking over logswitch finish >>>>and >>>>the event and changelog tables. Selecting logswitch_finish() on one of >>>>my >>>>replicas simply returned ""Slony-I: log switch to sl_log_2 still in >>>>progress - sl_log_1 not truncated” All three seem to be in that state >>>>with >>>>either sl_log_1 or sl_log_2. Looking closer at sl_event and sl_log_2 I’m >>>>seeing some strangeness. >>>> >>>>ams=# select min(pg_catalog.txid_snapshot_xmin(ev_snapshot)), >>>>max(pg_catalog.txid_snapshot_xmin(ev_snapshot)) from >>>>_ams_cluster.sl_event >>>>where ev_origin = 1; >>>>min|max >>>>---+--- >>>> 139136948 | 139204299 >>>>(1 row) >>>> >>>> >>>>ams=# select min(log_txid), max(log_txid) from _ams_cluster.sl_log_2 >>>>where >>>>log_origin = 1; >>>>min|max >>>>---+--- >>>> 631532717 | 631661386 >>>>(1 row) >>>> >>>> >>>>ams=# >>>> >>>> >>>>So I understand all the txids referenced in sl_event are in the 139M >>>>range while all those in sl_log_2 are in the 631M range. Normally, the >>>>sl_log txids should be older, shouldn’t they? Do you think I’ve hit a >>>>txid-wraparound problem? >>>> >>>>Tom:-) >>>> >>>> >>>> >>>>On 1/28/16, 1:09 PM, "Jan Wieck" <j...@wi3ck.info> wrote: >>>> >>>>>On 01/28/2016 11:11 AM, Tignor, Tom wrote: >>>>>> >>>>>> Output below. They seem to be replicating normally, except for the >>>>>>sl_log >>>>>> growth. >>>>> >>>>>Indeed. Is there anything in the slon logs for those nodes that says >>>>>why >>>>>it doesn't finish the log switch? >>>>> >>>>>Connect to the database as a the slony user. >>>>> >>>>>To check if a log switch is indeed in progress, do >>>>> >>>>> SELECT last_value FROM _ams_cluster.sl_log_status; >>>>> >>>>>It should be either 2 or 3. If it is 0 or 1, no log switch is in >>>>>progress and you can start one with >>>>>
Re: [Slony1-general] Slony-I: log switch to sl_log_2 still in progress - sl_log_1 not truncated
On 02/01/2016 01:24 PM, Tignor, Tom wrote: > > Quick update: a couple hours after deleting entries from both sl_log > tables with txids > 630M, it appears the cleanup thread has taken care of > business. sl_log_1 is down from 54GB to 24KB. I still wonder how that happened in the first place. Was there a sequence of dropping and re-creating a node recently? I think I have seen cases like this where if a node is re-created with the same node ID before the cleanup of data, belonging to the dropped node, had happened everywhere. The "cleanup" I am talking about is basically 2 complete log switches on all nodes after the DROP NODE had replicated everywhere. That takes at least 20-30 minutes and can in some cases take hours. Regards, Jan > > Tom:-) > > > On 2/1/16, 9:06 AM, "Tignor, Tom" <ttig...@akamai.com> wrote: > >> >> Jan, >> Thanks much for all the help. I’ve been looking over logswitch finish >> and >>the event and changelog tables. Selecting logswitch_finish() on one of my >>replicas simply returned ""Slony-I: log switch to sl_log_2 still in >>progress - sl_log_1 not truncated” All three seem to be in that state with >>either sl_log_1 or sl_log_2. Looking closer at sl_event and sl_log_2 I’m >>seeing some strangeness. >> >>ams=# select min(pg_catalog.txid_snapshot_xmin(ev_snapshot)), >>max(pg_catalog.txid_snapshot_xmin(ev_snapshot)) from _ams_cluster.sl_event >>where ev_origin = 1; >>min|max >>---+--- >> 139136948 | 139204299 >>(1 row) >> >> >>ams=# select min(log_txid), max(log_txid) from _ams_cluster.sl_log_2 where >>log_origin = 1; >>min|max >>---+--- >> 631532717 | 631661386 >>(1 row) >> >> >>ams=# >> >> >> So I understand all the txids referenced in sl_event are in the 139M >>range while all those in sl_log_2 are in the 631M range. Normally, the >>sl_log txids should be older, shouldn’t they? Do you think I’ve hit a >>txid-wraparound problem? >> >> Tom:-) >> >> >> >>On 1/28/16, 1:09 PM, "Jan Wieck" <j...@wi3ck.info> wrote: >> >>>On 01/28/2016 11:11 AM, Tignor, Tom wrote: >>>> >>>>Output below. They seem to be replicating normally, except for the >>>>sl_log >>>> growth. >>> >>>Indeed. Is there anything in the slon logs for those nodes that says why >>>it doesn't finish the log switch? >>> >>>Connect to the database as a the slony user. >>> >>>To check if a log switch is indeed in progress, do >>> >>> SELECT last_value FROM _ams_cluster.sl_log_status; >>> >>>It should be either 2 or 3. If it is 0 or 1, no log switch is in >>>progress and you can start one with >>> >>> SELECCT _ams_cluster.logswitch_start(); >>> >>>If it is 2 or 3, then you can do >>> >>> SELECT _ams_cluster.logswitch_finish(); >>> >>>All these operations are harmless and will only do what is safely >>>possible. Look at the code of logswitch_finish() to find out how it >>>determines if the current log switch can be finished. In short, the >>>cleanup thread is removing events from sl_event that have been confirmed >>>by all nodes in the cluster. The function logswitch_finish() looks if >>>there is anything left in sl_event, that belonged to that old log. If so >>>it will not finish. Running those queries manually you can find out what >>>that event is that is preventing the switch to finish. >>> >>> >>> >>>> >>>> >>>> a...@ams-repl2.ams.netmgmt:~$ /a/third-party/postgresql/bin/psql -U >>>> ams_slony -d ams -c 'select * from _ams_cluster.sl_status' >>>> st_origin | st_received | st_last_event | st_last_event_ts >>>> | >>>> st_last_received | st_last_received_ts | >>>> st_last_received_event_ts | st_lag_num_events | st_lag_time >>>> >>>>---+-+---+-- >>>>- >>>>+- >>>> >>>>-+---+-- >>>>- >>>>-- >>>> --+---+- >>>> 2 | 1 |5000611610 | 2016-01-28 >>>>16:06:37.343826+00 | >>>>5000611610 | 201
Re: [Slony1-general] Replication inexplicably stops
'f' > 2016-01-28 17:41:00 AmÚr. du Sud occid. WARN remoteWorker_wakeup: node > 1 - no > worker thread > 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG enableSubscription: sub_set=1 > 2016-01-28 17:41:00 AmÚr. du Sud occid. WARN remoteWorker_wakeup: node > 1 - no > worker thread > 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: last local event > sequence = > 5000462590 > 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: configuration > complete - st > arting threads > 2016-01-28 17:41:00 AmÚr. du Sud occid. INFO localListenThread: thread > starts > 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG version for "dbname = > Securithor2 >user = slonyuser password = securiTHOR971 port = 6234" is 90310 > NOTICE: Slony-I: cleanup stale sl_nodelock entry for pid=5188 > 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG enableNode: no_id=1 > 2016-01-28 17:41:00 AmÚr. du Sud occid. INFO remoteWorkerThread_1: > thread star > ts > 2016-01-28 17:41:00 AmÚr. du Sud occid. INFO remoteListenThread_1: > thread star > ts > 2016-01-28 17:41:00 AmÚr. du Sud occid. INFO main: running scheduler > mainloop > 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG cleanupThread: thread starts > 2016-01-28 17:41:00 AmÚr. du Sud occid. INFO syncThread: thread starts > 2016-01-28 17:41:00 AmÚr. du Sud occid. INFO monitorThread: thread starts > 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG version for "dbname = > Securithor2 >user = slonyuser password = securiTHOR971 port = 6234" is 90310 > 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG remoteWorkerThread_1: > update prov > ider configuration > 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG remoteWorkerThread_1: > added activ > e set 1 to provider 1 > 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG version for > "dbname=Securithor2 h > ost=192.168.1.50 user=slonyuser password = securiTHOR971 port = 6234" > is 90306 > 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG version for "dbname = > Securithor2 >user = slonyuser password = securiTHOR971 port = 6234" is 90310 > 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG cleanupThread: bias = 60 > 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG version for "dbname = > Securithor2 >user = slonyuser password = securiTHOR971 port = 6234" is 90310 > 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG version for "dbname = > Securithor2 >user = slonyuser password = securiTHOR971 port = 6234" is 90310 > 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG version for > "dbname=Securithor2 h > ost=192.168.1.50 user=slonyuser password = securiTHOR971 port = 6234" > is 90306 > 2016-01-28 17:41:00 AmÚr. du Sud occid. INFO remoteWorkerThread_1: > syncing set > 1 with 59 table(s) from provider 1 > > > > > It gets stuck at "syncing set 1 with 59 table(s) from provider 1" (the > last line) forever with the occasional messages that says something > about cleaning(threadcleaning I thing). > > > Checking the postgres logs, I see lots of: > > 2016-01-28 17:33:07 AST LOG: n'a pas pu recevoir les données du client > : unrecognized winsock error 10061 > > Which translates to: > > 2016-01-28 17:33:07 AST LOG: was not able to receive the data from the > client : unrecognized winsock error 10061 > > I'm able to connect to the main db from the replicated machine no > problem. I have no idea how this error 10061 is caused. Winsock error 10061 is WSAECONNREFUSED Connection refused. No connection could be made because the target computer actively refused it. This usually results from trying to connect to a service that is inactive on the foreign host—that is, one with no server application running. This might be a firewall issue. Can you use some network sniffer to find out what is happening on the TCP/IP level between the two machines? Regards, Jan -- Jan Wieck Senior Software Engineer http://slony.info ___ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general
Re: [Slony1-general] Slony-I: log switch to sl_log_2 still in progress - sl_log_1 not truncated
-28 13:18:34.487192+00 | f | > DLE> > > 16393 | ams |5041 | 213867 | ams_slony | > slon.origin_2_provider_2 | 80.67.75.105 | | > 36402 | 2016-01-22 01:55:22.964462+00 | | > 2016-01-28 13:18:34.519066+00 | f | > DLE> > > 16393 | ams |6694 | 213867 | ams_slony | > slon.node_2_listen| 60.254.150.133 | | > 61795 | 2016-01-22 01:59:12.095052+00 | | > 2016-01-28 13:18:27.928384+00 | f | > DLE> > > 16393 | ams |4456 | 213867 | ams_slony | > slon.node_2_listen| 72.246.50.22 | | > 51238 | 2016-01-22 01:54:21.481355+00 | | > 2016-01-28 13:18:36.766973+00 | f | > DLE> > > 16393 | ams |4457 | 213867 | ams_slony | > slon.node_2_listen| 80.67.75.105 | | > 36333 | 2016-01-22 01:54:21.500456+00 | | > 2016-01-28 13:18:36.204482+00 | f | > DLE> > > 16393 | ams |4428 | 213867 | ams_slony | > slon.local_monitor|| | > -1 | 2016-01-22 01:54:18.977015+00 | | > 2016-01-28 13:18:36.652567+00 | f | > DLE> > > 16393 | ams |4427 | 213867 | ams_slony | slon.local_sync > || | -1 | 2016-01-22 > 01:54:18.976932+00 | | 2016-01-28 > 13:18:36.151998+00 | f | > DLE> > > 16393 | ams |4426 | 213867 | ams_slony | > slon.local_cleanup|| | > -1 | 2016-01-22 01:54:18.976842+00 | | > 2016-01-28 13:12:12.582921+00 | f | > DLE> > > 16393 | ams |4425 | 213867 | ams_slony | > slon.remoteWorkerThread_4 || | > -1 | 2016-01-22 01:54:18.976783+00 | | > 2016-01-28 13:18:33.99715+00 | f | > DLE> > > 16393 | ams |4420 | 213867 | ams_slony | > slon.remoteWorkerThread_1 || | > -1 | 2016-01-22 01:54:18.976548+00 | | > 2016-01-28 13:18:33.561531+00 | f | > DLE> > > 16393 | ams |4419 | 213867 | ams_slony | > slon.remoteWorkerThread_3 || | > -1 | 2016-01-22 01:54:18.97647+00 | | > 2016-01-28 13:18:34.808907+00 | f | > DLE> > > 16393 | ams |4413 | 213867 | ams_slony | slon.local_listen > || | -1 | 2016-01-22 > 01:54:18.965568+00 | | 2016-01-28 > 13:18:37.096159+00 | f | > DLE> > > (21 rows) > > > ams=# > > > > Tom:-) > > > > > ___ > Slony1-general mailing list > Slony1-general@lists.slony.info > http://lists.slony.info/mailman/listinfo/slony1-general > -- Jan Wieck Senior Software Engineer http://slony.info ___ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general
Re: [Slony1-general] Slony-I: log switch to sl_log_2 still in progress - sl_log_1 not truncated
On 01/28/2016 11:11 AM, Tignor, Tom wrote: > > Output below. They seem to be replicating normally, except for the > sl_log > growth. Indeed. Is there anything in the slon logs for those nodes that says why it doesn't finish the log switch? Connect to the database as a the slony user. To check if a log switch is indeed in progress, do SELECT last_value FROM _ams_cluster.sl_log_status; It should be either 2 or 3. If it is 0 or 1, no log switch is in progress and you can start one with SELECCT _ams_cluster.logswitch_start(); If it is 2 or 3, then you can do SELECT _ams_cluster.logswitch_finish(); All these operations are harmless and will only do what is safely possible. Look at the code of logswitch_finish() to find out how it determines if the current log switch can be finished. In short, the cleanup thread is removing events from sl_event that have been confirmed by all nodes in the cluster. The function logswitch_finish() looks if there is anything left in sl_event, that belonged to that old log. If so it will not finish. Running those queries manually you can find out what that event is that is preventing the switch to finish. > > > a...@ams-repl2.ams.netmgmt:~$ /a/third-party/postgresql/bin/psql -U > ams_slony -d ams -c 'select * from _ams_cluster.sl_status' > st_origin | st_received | st_last_event | st_last_event_ts| > st_last_received | st_last_received_ts | > st_last_received_event_ts | st_lag_num_events | st_lag_time > ---+-+---+---+- > -+---+- > --+---+- > 2 | 1 |5000611610 | 2016-01-28 16:06:37.343826+00 | >5000611610 | 2016-01-28 16:06:38.843562+00 | 2016-01-28 > 16:06:37.343826+00 | 0 | 00:00:09.201996 > 2 | 3 |5000611610 | 2016-01-28 16:06:37.343826+00 | >5000611609 | 2016-01-28 16:06:29.851545+00 | 2016-01-28 > 16:06:27.341894+00 | 1 | 00:00:19.203928 > 2 | 4 |5000611610 | 2016-01-28 16:06:37.343826+00 | >5000611610 | 2016-01-28 16:06:38.710974+00 | 2016-01-28 > 16:06:37.343826+00 | 0 | 00:00:09.201996 > (3 rows) > > > a...@ams-repl3.lga.netmgmt:~$ /a/third-party/postgresql/bin/psql -U > ams_slony -d ams -c 'select * from _ams_cluster.sl_status' > st_origin | st_received | st_last_event | st_last_event_ts| > st_last_received | st_last_received_ts | > st_last_received_event_ts | st_lag_num_events | st_lag_time > ---+-+---+---+- > -+---+- > --+---+- > 3 | 4 |5000654642 | 2016-01-28 16:07:05.493455+00 | >5000654642 | 2016-01-28 16:07:06.486539+00 | 2016-01-28 > 16:07:05.493455+00 | 0 | 00:00:08.522529 > 3 | 1 |5000654642 | 2016-01-28 16:07:05.493455+00 | >5000654642 | 2016-01-28 16:07:08.040292+00 | 2016-01-28 > 16:07:05.493455+00 | 0 | 00:00:08.522529 > 3 | 2 |5000654642 | 2016-01-28 16:07:05.493455+00 | >5000654642 | 2016-01-28 16:07:08.472049+00 | 2016-01-28 > 16:07:05.493455+00 | 0 | 00:00:08.522529 > (3 rows) > > > a...@ams-repl4.blr.netmgmt:~$ /a/third-party/postgresql/bin/psql -U > ams_slony -d ams -c 'select * from _ams_cluster.sl_status' > st_origin | st_received | st_last_event | st_last_event_ts| > st_last_received | st_last_received_ts | > st_last_received_event_ts | st_lag_num_events | st_lag_time > ---+-+---+---+- > -+---+- > --+---+- > 4 | 3 |5000637483 | 2016-01-28 16:07:32.698809+00 | >5000637482 | 2016-01-28 16:07:28.731404+00 | 2016-01-28 > 16:07:22.695826+00 | 1 | 00:00:19.077657 > 4 | 1 |5000637483 | 2016-01-28 16:07:32.698809+00 | >5000637482 | 2016-01-28 16:07:24.839978+00 | 2016-01-28 > 16:07:22.695826+00 | 1 | 00:00:19.077657 > 4 | 2 |5000637483 | 2016-01-28 16:07:32.698809+00 | > 5000637482 | 2016-01-28 16:07:22.926411+00 | 2016-01-28 > 16:07:22.695826+00 | 1 | 00:00:19.077657 > (3 rows) > > > > Tom:-) > > > > On 1/28/16, 10:38 AM, "Jan Wieck" <j...@wi3ck.info> wrote: > >>On 01/28/2016
Re: [Slony1-general] Slony with Amazon's RDS service?
On 12/21/2015 05:44 PM, David Fetter wrote: > On Fri, Dec 18, 2015 at 11:45:04AM -0300, "Gonzalo Vásquez @ Waypoint" wrote: >> Is it actually possible to somehow use Slony with Amazon’s RDS >> service? I’m aware that there’s no such thing as a host that you can >> log into to make Slony’s configurations, but perhaps it can be done >> remotely from a side server or something like that? > > You would need to get access to RDS's catalog that you don't have, so > my best guess is, "no." That's the technical side. > > On the business side, Amazon has very strong incentives to block > anything that might let you migrate off their services easily, so my > best guess is that any attempt to do this would be treated by Amazon > as an attack. On the technical side it would make a lot of sense to actually support the strongest replication system, that has been though the test of time for cross platform, cross version upgrade for a decade now. But then again, that would mean to invest in people and that means becoming dependent on individuals. So on the business side it doesn't make that much sense. Hire and fire businesses like Amazon cannot depend on skilled individuals. Regards, Jan -- Jan Wieck Senior Software Engineer http://slony.info ___ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general
Re: [Slony1-general] remote listener serializability
On 11/20/2015 08:45 AM, Steve Singer wrote: > On 11/19/2015 01:09 PM, Tignor, Tom wrote: >> A general question for the group: if we would consider a change like >> this >> (as a runtime option or otherwise), what’s the correct way to move it >> forward? Should I file a bug? Are there specific tests or analysis which >> should be performed? > > I would use bug 336 for this. > My gut is to make this a runtime option since I am not confident that > downgrading the isolation level won't impact someones workload, having > said that I am open to being convinced otherwise. I would also be > inclined to leave the default value unchanged for 2.2.x and then maybe > consider changing the default in 2.3. If we actually any concrete > plans for a 2.3 version I would say just leave the change for 2.3 but I > don't see a 2.3 release happening soon and I don't want to change the > default behaviour in a minor release. > > > Generally when we make a change like this I try to do lots of runs > through the disorder/clustertest suite that doesn't cover all types of > workloads. Have you tried this change on your workload? Does it > actually help things? > > It would be great if Jan and Chris were to comment on this thread Without taking SYNC snapshots in a SERIALIZABLE transaction I believe that a Slony-I replica could suffer the same inconsistency dangers that a pg_dump without --serializable-deferrable can suffer. Namely a replica would not be usable as a source for reporting. From the 9.4 pg_dump docs: "This option is not beneficial for a dump which is intended only for disaster recovery. It could be useful for a dump used to load a copy of the database for reporting or other read-only load sharing while the original database continues to be updated. Without it the dump may reflect a state which is not consistent with any serial execution of the transactions eventually committed. For example, if batch processing techniques are used, a batch may show as closed in the dump without all of the items which are in the batch appearing." Changing the default isolation levels(s) may therefore change, what a replica can safely be used for and I believe that creating reports is one of the major use cases. Using options with big, bold, red, flashing warnings in the documentation would be the only way to go. Regards, Jan > >> >> Tom:-) >> >> >> On 11/18/15, 10:35 AM, "Greg Sabino Mullane" <g...@endpoint.com> wrote: >> >>> On Wed, Nov 18, 2015 at 02:26:15PM +, Tom Tignor wrote: >>> ... >>>> Sorry for the delay getting back. Inspired by your questions, I¹ve been >>>> reading up on SSI, the Cahill paper and slony1 and postgres code. >>> ... >>> >>> It should be pointed out that 9.1 goes EOL (End of Life) in less than >>> a year (Sep 2016), and transaction handling has changed a *lot* since >>> then, >>> so any changes that core Slony makes may not even work for you. >>> >>> (FWIW, I think dropping the isolation level in this particular >>> instance seems safe, however.) >>> >>> -- >>> Greg Sabino Mullane g...@endpoint.com >>> End Point Corporation >>> PGP Key: 0x14964AC8 >> >> _______ >> Slony1-general mailing list >> Slony1-general@lists.slony.info >> http://lists.slony.info/mailman/listinfo/slony1-general >> > > ___ > Slony1-general mailing list > Slony1-general@lists.slony.info > http://lists.slony.info/mailman/listinfo/slony1-general > -- Jan Wieck Senior Software Engineer http://slony.info ___ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general
Re: [Slony1-general] Network connection from slaves to the master
On 11/10/2015 08:03 AM, TOINEL, Ludovic wrote: > Thanks Andrew, > > We are not allowed to have network connection from the slaves to the master > (for security constraints). Only master can communicate with slaves. > We need database on slaves with mix replicates tables and read/write tables. I presume that this is a restriction on your routers or gateways and that the initiating SYN packet for a new TCP connection cannot come from what you envision to be the Slave. If that is correct, you can run all the slon daemons on what you consider the Master. All TCP connections are then initiated from the Master. This has little to no effect on how Slony functions. Regards, Jan > > The solution could be maybe that solution using a slony master has an Hot > standby of a master protected somewhere ? > > [slony slaves] <-> [slony master - Standby node] <(log > shipping)--|firewall|-- [master protected somewhere] > > Do you think this solution can work with slony ? > > Regards, > > Ludovic Toinel > > -Message d'origine- > De : slony1-general-boun...@lists.slony.info > [mailto:slony1-general-boun...@lists.slony.info] De la part de Andrew Sullivan > Envoyé : mardi 10 novembre 2015 12:26 > À : slony1-general@lists.slony.info > Objet : Re: [Slony1-general] Network connection from slaves to the master > > On Tue, Nov 10, 2015 at 09:51:29AM +, TOINEL, Ludovic wrote: >> The network allows only flows from master to slaves. >> >> Is there any option that I missed to do that ? > > Not really. In principle you could do this with the log shipping mode, but I > don't recall whether doing that on the master was not possible or just a > really bad idea. (You could do this with the built-in standby mechanisms of > Postgres, though. > > I do wonder why you have it set up this way, however. Why do you control the > flows this way? > > A > > -- > Andrew Sullivan > a...@crankycanuck.ca > ___ > Slony1-general mailing list > Slony1-general@lists.slony.info > http://lists.slony.info/mailman/listinfo/slony1-general > _______ > Slony1-general mailing list > Slony1-general@lists.slony.info > http://lists.slony.info/mailman/listinfo/slony1-general > -- Jan Wieck Senior Software Engineer http://slony.info ___ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general
Re: [Slony1-general] Slony Replication Startup Speed
On 10/29/2015 09:49 PM, Sung Hsin Lei wrote: > Actually, node 2 is correct. I have 2 different clusters. Each cluster > has node 1 and node 2. Each cluster replicates to 1 db. What is the reason for this setup? If the two slaves are subscribed to different sets of tables, then the same can be achieved by creating multiple sets and subscribing the slaves to individual combinations of them. Regards, Jan > > On Thu, Oct 29, 2015 at 8:39 PM, Scott Marlowe <scott.marl...@gmail.com > <mailto:scott.marl...@gmail.com>> wrote: > > On Thu, Oct 29, 2015 at 6:06 PM, Sung Hsin Lei <sungh@gmail.com > <mailto:sungh@gmail.com>> wrote: > > Hello, > > > > I successfully update the main db and one replicated db with the > following: > > > > cluster name = slony_rep1 > > > > node 1 admin conninfo = 'dbname = MyDB host = localhost user = slony1 > > password = Ejhfg33EdddsufhErR76 port = 6234'; > > node 2 admin conninfo = 'dbname = MyDB host = 86.88.5.4 user = slony1 > > password = Ejhfg33EdddsufhErR76 port = 6234'; > > > > EXECUTE SCRIPT > > ( > > SQL = 'ALTER TABLE operators RENAME COLUMN firstname TO lastname;', > > EVENT NODE = 1 > > ); > > > > > > > > > > However, I have 1 main db to 2 replicated db. The second replicated db > has > > the following cluster name and node information: > > > > cluster name = slony_rep2 > > > > node 1 admin conninfo = 'dbname = MyDB host = localhost user = slony2 > > password = Ejhfg33EdddsufhErR76 port = 6234'; > > node 2 admin conninfo = 'dbname = MyDB host = 86.88.5.17 user = slony2 > > password = Ejhfg33EdddsufhErR76 port = 6234'; > > Shouldn't that be node 3 not node 2? > > > Do I need to run slonik twice with different cluster and node indo? That > > does not seem right. After running the first time, the main db and the > first > > replicated db will be consistent but not the second replicated db. > Also, if > > I run it a second time, wouldn't the main db already be updated hence > the > > sql statements used for the original update will surely fail? > > You can just run it on all three at once. > > node 1 admin conninfo = 'dbname = MyDB host = localhost user = slony1 > password = Ejhfg33EdddsufhErR76 port = 6234'; > node 2 admin conninfo = 'dbname = MyDB host = 86.88.5.4 user = slony1 > password = Ejhfg33EdddsufhErR76 port = 6234'; > node 3 admin conninfo = 'dbname = MyDB host = 86.88.5.17 user = slony2 > password = Ejhfg33EdddsufhErR76 port = 6234'; > > EXECUTE SCRIPT > ( > SQL = 'ALTER TABLE operators RENAME COLUMN firstname TO lastname;', > EVENT NODE = 1 > ); > > > > > ___ > Slony1-general mailing list > Slony1-general@lists.slony.info > http://lists.slony.info/mailman/listinfo/slony1-general > -- Jan Wieck Senior Software Engineer http://slony.info ___ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general
Re: [Slony1-general] upgrade slony master / slave using pg_upgrade
On 09/29/2015 05:23 PM, Mark Steben wrote: > Good evening > > Fairly simple question (I think) > Can I use the pg_upgrade contrib module to upgrade from postgres 9.2 to > 9.4 without having to > re-define and re-subscribe all of the slony replicated data? My initial > pg_upgrade tests are yelling at me that some slony libraries are not > found in the new 9.4 binaries: > > Could not load library "$libdir/slony1_funcs" > ERROR: could not access file "$libdir/slony1_funcs": No such file or > directory > > Hopefully it is just and issue of copying the slony binaries from 9.2 to > the same libraries in 9.4 on master and slave, or is a re-creation > necessary? (We are running slony 2.2.3) Not that simple. Slony is quite dependent on version numbers. All nodes in a Slony cluster must run the exact same Slony version. Each node can run against a different PostgreSQL version, as long as the cluster's Slony version supports that PostgreSQL version. However, the shared libs and such on each node must be of that Slony version compiled against that PostgreSQL version. So just copying the 9.2 slony $lib and other content over to the 9.4 lib and shared directories isn't going to do it. You need those files to be built against the 9.4 tree. This is because of code shift inside the PostgreSQL backend. One and the same Slony shared lib function is using different backend code to accomplish the same task. There is quite a bit of #ifdef stuff dependent on PostgreSQL versions in the Slony shared object code. Regards, Jan -- Jan Wieck Senior Software Engineer http://slony.info ___ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general
Re: [Slony1-general] Cloning an origin?
On 07/29/2015 09:34 AM, David Fetter wrote: On Tue, Jul 28, 2015 at 06:51:44PM -0400, Jan Wieck wrote: On 07/25/2015 12:31 AM, David Fetter wrote: Folks, While in the best of all possible worlds, we'd have planned out a replication strategy before we get tables whose initial sync via SUBSCRIBE SET will never finish, we aren't always given that much ability to plan that soon. One thing I forgot to ask initially. What do you mean by will never finish? Is that just being facetious about it will take a long time, is it that you don't have the disk storage to swallow the back log that will accumulate or is it outdated knowledge from the times when Slony did actually slow down with accumulation of backlog? As far as I could tell, backlog was accumulating faster than the initial sync was happening. If that is the case with 2.2, then there is a good chance that the slave won't be able to keep up anyway. Note that the log switching on the master is severely hindered until the slave is actually caught up. So you will see a sawtooth pattern in the total amount of sl_log. I can explain that in more detail if needed. Assuming a more or less steady rate of update activity on the master, the rate at which SYNC events get replicated is a better indicator. If master-SYNC event appear on the slave faster than they are generated on the master, it is catching up. Regards, Jan -- Jan Wieck Senior Software Engineer http://slony.info ___ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general
Re: [Slony1-general] Cloning an origin?
On 07/25/2015 12:31 AM, David Fetter wrote: Folks, While in the best of all possible worlds, we'd have planned out a replication strategy before we get tables whose initial sync via SUBSCRIBE SET will never finish, we aren't always given that much ability to plan that soon. One thing I forgot to ask initially. What do you mean by will never finish? Is that just being facetious about it will take a long time, is it that you don't have the disk storage to swallow the back log that will accumulate or is it outdated knowledge from the times when Slony did actually slow down with accumulation of backlog? Regards, Jan -- Jan Wieck Senior Software Engineer http://slony.info ___ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general
Re: [Slony1-general] Cloning an origin?
On 07/27/2015 07:49 PM, Steve Singer wrote: Or am I still missing something? No, I missed the part that the transactions still in progress at the next SYNC after backup start will be handled by precisely that. So the logic then is to take the last master-SYNC found on the restored slave, increase maxxid in it to the highest txid found in sl_log_*, remove all txids found in sl_log_* from the xip vector and use that as sl_setsync. Add a sl_confirm entry for that master-SYNC reflecting that the slave is caught up to/beyond that point and that the next SYNC to process is the one following the backup start. There is only one detail left then. While we are doing all that we need the postmaster for the restored slave up and running. But we need to make sure that the slon of the master (or any node for that matter) does not connect to the slave and get confused by seeing the wrong data. That may be accomplished by changing the local node ID last so that every slon would throw an error because they are connecting to the wrong node. Jan -- Jan Wieck Senior Software Engineer http://slony.info ___ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general
Re: [Slony1-general] Cloning an origin?
On 07/26/2015 10:48 PM, Steve Singer wrote: On 07/26/2015 08:01 PM, Jan Wieck wrote: On 07/26/2015 07:35 PM, Jan Wieck wrote: That said, pg_dump isn't that much faster than Slony's copy_set() so we'd need to find a way to reconstruct an sl_setsync entry from something like a binary base backup. That is not trivial. Here is a wild idea that I am going to test, but someone may throw a wrench into it while I'm doing that: Let us assume that the application can be stopped and the database brought into so called single user mode for a short period of time. During that maintenance window we do the following: * Shutdown the application * Create the cluster with two nodes and paths via slonik script. * Shutdown the slon for the slave. * Create a SYNC event on the master. * Execute a SUBSCRIBE SET ... OMIT COPY. * Create an LVM snapshot of $PGDATA (+ pg_xlog ...) * Resume the application That should take no longer than a few minutes. Properly scripted it could be in the second range. We now could * Use LVM snapshot(s) to plow over the slave's $PGDATA and so on. * Modify the slave's DB to correct the local node ID and fake an sl_setsync entry reflecting the SYNC created on the master. * Fire up the slave's slon. Comments? I think that would work. Is there a way we could do this without an application outage. It seems to me that if you take a backup of the master wih pg_basebackup or through any other consistent method of a binary backup you could look at the restored instance and say * Any transactions that show as committed in the last SYNC that shows up in the restored sl_event table are committed * Any transactions that were not included in the above, but have actions in sl_log_1 or sl_log_2 on the restored instance were committed at the time of the backup (otherwise the rows inserted by those transactions couldn't be in sl_log) * Any transactions that were committed by the next SYNC on the origin but don't show up in the above list can be pulled as part of the first real SYNC to be processed, ie you would show those as 'in progress' (or not yet started) for fake sl_setsync value you create. I think this is insufficient. Transactions that were in progress when the backup started are not guaranteed to be committed by the next SYNC. In fact they could still be in progress many SYNCs later. What probably could work to produce that snapshot is to wait for the sl_log, that was active (inserted into) at backup time to be closed - IOW a log switch to complete. All xids found in that log on the master EXCEPT all xids in that log on the restored slave are to be considered in progress at backup time. This should build an artificial snapshot that tells us where to start. Any SYNC after the log switch completed would make for a suitable next SYNC to leap to. So we would record this artificial snapshot as a SYNC with ev_seqno - 1 of the next SYNC. Regards, Jan -- Jan Wieck Senior Software Engineer http://slony.info ___ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general
Re: [Slony1-general] Cloning an origin?
On 07/26/2015 07:35 PM, Jan Wieck wrote: That said, pg_dump isn't that much faster than Slony's copy_set() so we'd need to find a way to reconstruct an sl_setsync entry from something like a binary base backup. That is not trivial. Here is a wild idea that I am going to test, but someone may throw a wrench into it while I'm doing that: Let us assume that the application can be stopped and the database brought into so called single user mode for a short period of time. During that maintenance window we do the following: * Shutdown the application * Create the cluster with two nodes and paths via slonik script. * Shutdown the slon for the slave. * Create a SYNC event on the master. * Execute a SUBSCRIBE SET ... OMIT COPY. * Create an LVM snapshot of $PGDATA (+ pg_xlog ...) * Resume the application That should take no longer than a few minutes. Properly scripted it could be in the second range. We now could * Use LVM snapshot(s) to plow over the slave's $PGDATA and so on. * Modify the slave's DB to correct the local node ID and fake an sl_setsync entry reflecting the SYNC created on the master. * Fire up the slave's slon. Comments? -- Jan Wieck Senior Software Engineer http://slony.info ___ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general
Re: [Slony1-general] too much work !
On 01/13/2015 09:45 AM, Sebastien Marchand wrote: Yeah, 17 cluster name if you prefer. I start one slon by one cluster. Samples : slon -s 200 -t 6 -g 50 -o 6 -c 0 -d 1 -f /home/scripts/slon.conf repli_nat dbname=DB2 host=127.0.0.1 port=5432 user=slony password=123 slon -s 200 -t 6 -g 50 -o 6 -c 0 -d 1 -f /home/scripts/slon.conf repli_lc3 dbname=DB2 host=127.0.0.1 port=5432 user=slony password=123 slon -s 200 -t 6 -g 50 -o 6 -c 0 -d 1 -f /home/scripts/slon.conf repli_lc2 dbname=DB2 host=127.0.0.1 port=5432 user=slony password=123 This looks as if you created 17 Slony clusters instead of one cluster with 18 nodes. Jan -Message d'origine- De : Stéphane Schildknecht [mailto:stephane.schildkne...@postgres.fr] Envoyé : mardi 13 janvier 2015 15:22 À : Sebastien Marchand; 'Glyn Astill'; 'Vick Khera' Cc : 'slony' Objet : Re: [Slony1-general] too much work ! On 12/01/2015 11:38, Sebastien Marchand wrote: Hi, 18 nodes, wan network. 2 replications ( one with 1 set and other with 2 sets ( 2 directions ) ) First replication from server 1 to n servers ( same schema, 37 tables ) 1 slon on server 1 and 1 slon by remote server Second replication : set 1 : server A - server 1 ( 90 replicated tables ) set 2 : server A - server 1 ( 1 replicated tables ) do the same with 17 other servers ( one schema by server ) 17 slons on server 1 and 1 slon by remote server database 5 gB. I don't know if that will help... Seems to me some information is missing. Distinction on databases, for instance. I guess, the 2 replications are not correlated.They just share same servers. In fact, the second replication is not 1 replication, it is 17 different replications. But, do they all end in the same database on server A ? -- Stéphane Schildknecht Contact régional PostgreSQL pour l'Europe francophone Loxodata - Conseil, expertise et formations ___ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general -- Jan Wieck Senior Software Engineer http://slony.info ___ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general
Re: [Slony1-general] PGRES_FATAL_ERROR ERROR: could not access file $libdir/slony1_funcs.2.2.3: No such file or directory
On Dec 16, 2014 1:35 AM, Tory M Blue tmb...@gmail.com wrote: Trying a production upgrade, after upgrade 3 different environments and I'm getting this error. PGRES_FATAL_ERROR ERROR: could not access file $libdir/slony1_funcs.2.2.3: No such file or directory The files are there, anyway to force this? even tried to create a export libdir with no success. Are the files there on all the involved nodes? Jan -- Jan Wieck Senior Software Engineer http://slony.info ___ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general
Re: [Slony1-general] Duplicate key while merging temporary to main set
On 12/16/2014 05:32 AM, Carlos Henrique Reimer wrote: 8357_isarq.wm_bco_pla; PGRES_FATAL_ERROR ERROR: could not create unique index pk_wm_bco_pla DETAIL: Key (bco_cod, pla_cod)=(399, 5056110) is duplicated. CONTEXT: SQL statement reindex table 8357_isarq.wm_bco_pla PL/pgSQL function finishtableaftercopy(integer) line 26 at EXECUTE statement 2014-12-16 07:08:25 BRST WARN remoteWorkerThread_1: data copy for set 999 failed 108 times - sleep 60 seconds The error indicates 8357_isarq.wm_bco_pla has duplicated rows for primary key (bco_cod, pla_cod)=(399, 5056110) but when I query the table using this pk I got only one row: select * from 8357_isarq.wm_bco_pla where bco_cod=399 and pla_cod = '5056110'; bco_cod | pla_cod |pla_des| ativo | conta_pai | pla_itlistserv | pla_ctacosif -+-+---+---+---++-- 399 | 5056110 | RENDAS TRANSACOES VISA ELETRON - HBBR | S | 5056004 | 1501 | 71799003 (1 row) This means that a sequential scan (done by the COPY) produces multiple rows with that primary key, while only one of them has an index entry. As Andrew pointed out, you have index corruption on the source database. I bet a REINDEX of that table on the source will fail too. Slony is trying to merge the temporary set to the main set every 60 seconds and getting this error. I do not want to loose three days of replication processing. No, Slony is still processing the SUBSCRIBE SET for the small temporary set (and repeatedly failing on that). The MERGE SET is queued behind and not touched yet. Is there anything I can do to fix this error or at least can I remove this table from the temporary set and work on this issue afterwards? Reading your follow up, you only got a single duplicate on this one. However, there is a high chance that your DELETE actually deleted the wrong row version. Since you are running a really outdated version of PostgreSQL I would not be surprised if some bug made an outdated version of that row visible again. The index entry for that PK would be pointing at the last row version, which you now deleted. The replica would now contain the old row version again. Check if you have a row with that PK in the replica and what that actually looks like. Regards, Jan -- Jan Wieck Senior Software Engineer http://slony.info ___ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general
Re: [Slony1-general] Slony-I 2.2.3 fails on execute script
On 12/05/2014 03:33 PM, Waldo Nell wrote: I have a working Slony-I setup with one master and one slave. I tried to run a simple alter table add column statement via execute script. This always worked for me, however today it failed. Please stop using terms like master and slave in a Slony context. There is no master or slave. There are origins of sets and subscribers of sets. The latter are sometimes referred to as replica. Thanks, Jan -- Jan Wieck Senior Software Engineer http://slony.info ___ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general
Re: [Slony1-general] RDS PostgreSQL?
On 12/02/2014 05:04 PM, David Fetter wrote: On Tue, Dec 02, 2014 at 02:07:08PM -0500, Jim Mlodgenski wrote: On Tue, Dec 2, 2014 at 1:18 PM, David Fetter da...@fetter.org wrote: Folks, Is there any way to make an Amazon RDS PostgreSQL database a Slony node? I don't believe they've deployed the Slony libraries on RDS yet, but you can use Londiste or Bucardo. They just recently announced support for both of them. It's great to know there are logical replication systems available. I appears that the people, working for Amazon in India, don't have what it takes to master Slony. They do well on MySQL, but that's about it. Jan -- Jan Wieck Senior Software Engineer http://slony.info ___ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general
Re: [Slony1-general] Lag time increasing but there are no events
On 10/16/2014 11:48 AM, Glyn Astill wrote: From: Glyn Astill glynast...@yahoo.co.uk To: Dave Cramer davecra...@gmail.com Cc: slony slony1-general@lists.slony.info Sent: Thursday, 16 October 2014, 16:26 Subject: Re: [Slony1-general] Lag time increasing but there are no events From: Dave Cramer davecra...@gmail.com To: Glyn Astill glynast...@yahoo.co.uk Cc: slony slony1-general@lists.slony.info Sent: Thursday, 16 October 2014, 15:41 Subject: Re: [Slony1-general] Lag time increasing but there are no events Actually I think it is because a pg_dump of the db is going on. Can you dump a slave ? Do you have to exclude the slony clusters (which would make sense)? Yeah from past experience the locks taken by pg_dump cause replication to lag. I normally use the -N switch in pg_dump to exclude the schema. For pg_restore I usually create a TOC with the -l option and pipe it to grep to filter out the references to the slony schema that get dumped for the triggers, and then -L option to use it with my restore. Also I assumed you were using 2.0+ there, with 1.2 there's some slight fiddling done in pg_catalog to disable triggers, from memory pg_trigger.tgrelid is pointed to an index rather than the table or something like that. I used to dump out with the slony schema, and run uninstallnode() and drop schema cascade against the restored database. Dumping a replica won't work 2.0 because the dump won't contain all the information. With recent PG versions it will actually error out. With 2.0+ it does work, so with 2.0+ it is possible to take the backup from a replica. Dave: A pg_dump can very well stop the creation of new SYNC events since slon needs a brief exclusive lock on the sl_event table. If that is the case, the st_last_event in sl_status will not be advancing as if no slon would be running, which would explain the increase of st_lag_time with zero st_lag_num_events. As suggested, dumping the database with -N to exclude the Slony-I schema should do the trick and dumping it has very little value anyway. If you ever have to restore from that dump, you'd either have to restore all replicas from it as well and rebuild the cluster with OMIT COPY, or rebuild the cluster with Slony copying the data. You would need to do the latter if replicas aren't full copies of the entire master. Regards, Jan -- Jan Wieck Senior Software Engineer http://slony.info ___ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general
Re: [Slony1-general] Lag time increasing but there are no events
On 10/17/2014 09:00 AM, Dave Cramer wrote: So what's the best way to deal with very large databases. pg_dump just doesn't cut it. By the time we are finished dumping the data is stale anyway ! I'm thinking of using PITR instead. Rolling over once a week basebackup or some combination of pg_start_backup() and rsync will work. You can do that against master or replica and do the UNINSTALL NODE in case you have to use it. Jan Dave Cramer On 17 October 2014 08:51, Jan Wieck j...@wi3ck.info mailto:j...@wi3ck.info wrote: On 10/16/2014 11:48 AM, Glyn Astill wrote: From: Glyn Astill glynast...@yahoo.co.uk mailto:glynast...@yahoo.co.uk To: Dave Cramer davecra...@gmail.com mailto:davecra...@gmail.com Cc: slony slony1-general@lists.slony.__info mailto:slony1-general@lists.slony.info Sent: Thursday, 16 October 2014, 16:26 Subject: Re: [Slony1-general] Lag time increasing but there are no events From: Dave Cramer davecra...@gmail.com mailto:davecra...@gmail.com To: Glyn Astill glynast...@yahoo.co.uk mailto:glynast...@yahoo.co.uk Cc: slony slony1-general@lists.slony.__info mailto:slony1-general@lists.slony.info Sent: Thursday, 16 October 2014, 15:41 Subject: Re: [Slony1-general] Lag time increasing but there are no events Actually I think it is because a pg_dump of the db is going on. Can you dump a slave ? Do you have to exclude the slony clusters (which would make sense)? Yeah from past experience the locks taken by pg_dump cause replication to lag. I normally use the -N switch in pg_dump to exclude the schema. For pg_restore I usually create a TOC with the -l option and pipe it to grep to filter out the references to the slony schema that get dumped for the triggers, and then -L option to use it with my restore. Also I assumed you were using 2.0+ there, with 1.2 there's some slight fiddling done in pg_catalog to disable triggers, from memory pg_trigger.tgrelid is pointed to an index rather than the table or something like that. I used to dump out with the slony schema, and run uninstallnode() and drop schema cascade against the restored database. Dumping a replica won't work 2.0 because the dump won't contain all the information. With recent PG versions it will actually error out. With 2.0+ it does work, so with 2.0+ it is possible to take the backup from a replica. Dave: A pg_dump can very well stop the creation of new SYNC events since slon needs a brief exclusive lock on the sl_event table. If that is the case, the st_last_event in sl_status will not be advancing as if no slon would be running, which would explain the increase of st_lag_time with zero st_lag_num_events. As suggested, dumping the database with -N to exclude the Slony-I schema should do the trick and dumping it has very little value anyway. If you ever have to restore from that dump, you'd either have to restore all replicas from it as well and rebuild the cluster with OMIT COPY, or rebuild the cluster with Slony copying the data. You would need to do the latter if replicas aren't full copies of the entire master. Regards, Jan -- Jan Wieck Senior Software Engineer http://slony.info -- Jan Wieck Senior Software Engineer http://slony.info ___ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general
Re: [Slony1-general] Changing master node's IP port
On 10/14/2014 11:57 AM, Glyn Astill wrote: From: Granthana Biswas granthana.bis...@gmail.com To: Glyn Astill glynast...@yahoo.co.uk Cc: slony1-general@lists.slony.info slony1-general@lists.slony.info Sent: Tuesday, 14 October 2014, 16:48 Subject: [Slony1-general] Changing master node's IP port Hi Glyn, Yes I had stopped all the slons for every node but I did not DELETE FROM _Cluster1.sl_nodelock WHERE nl_nodeid = 1 AND nl_conncnt = 0; Well you shouldn't have had to run the delete, that's just to get you going. I assume you're up and running now? The cleanup procedure for nodelock checks if the backend process, holding the lock (if any) is still alive. If I had to guess my guess would be that someone/something changed the IP address of the server without stopping the slon processes first and that IP address change leads to a connection loss without the TCP connections getting RST or FIN packets. In that situation it is likely that the database backend from the old slon connection is waiting on a blocking read and will only notice that the connection is gone after a full TCP keepalive timeout, which defaults to several hours. Terminating the slony related database connections via pg_terminate_backend() will make the nodelock cleanup succeed. In any case it is a good practice to have TCP keepalive settings a lot more aggressive on both sides, PostgreSQL and Slony. At my former work place we used to set them to 60 seconds idle, then 9 keepalive packets in 7 second interval. That will let the connections time out in about 2 minutes. Regards, Jan -- Jan Wieck Senior Software Engineer http://slony.info ___ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general
Re: [Slony1-general] Error with Slony replication from another slony cluster slave
On 10/09/2014 08:18 AM, Granthana Biswas wrote: Hi All, I am trying to replicate from another Slony cluster's slave node. Cluster1 - replicating from DB1 to - DB2 Cluster2 - replicating from DB2 to - DB3 The initial sync up went fine without any errors. There are no errors in logs of both the clusters. Also no st_lag_num_events in DB3 or DB2 for Cluster2. But the data added in DB2 since I started slony Cluster2 is not reflecting in DB3. Does slony allow replication from another cluster's slony slave? Or did I miss something? This does not work because the logTrigger will only record information when firing in session_replication_role 'origin'. When Slony is applying changes from DB1 to DB2, the session_replication_role is 'replica', so the logTrigger action is suppressed inside the trigger function. You will have to make DB3 a member of Cluster1 and use DB2 as the data provider to achieve this. This has the added benefit that you can easily change the data provider of DB3 to DB1 in case you need to perform any maintenance on DB2. Regards, Jan -- Jan Wieck Senior Software Engineer http://slony.info ___ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general
Re: [Slony1-general] Error with Slony replication from another slony cluster slave
On 10/09/2014 05:26 PM, Glyn Astill wrote: From: Granthana Biswas granthana.bis...@gmail.com To: Glyn Astill glynast...@yahoo.co.uk Sent: Thursday, 9 October 2014, 16:34 Subject: Re: [Slony1-general] Error with Slony replication from another slony cluster slave Hi Glyn, In my case I have two clusters: Cluster1 - replicating from DB1 - DB2 Cluster2 - replicating from DB1 - DB3 Can I stop Cluster2 and add DB3 to Cluster1 with DB2 as its master? Or do I have to delete the data first in DB3? You'll want to run DROP NODE against each node in Cluster2, (or if on 2.0+ you can get away with just DROP SCHEMA _Cluster2 CASCADE) and stop the slons for Cluster2. Cascading, forwarding and all that is one of the core concepts of Slony. If you create one single setup looking like this: DB1 - DB2 - DB3 then you have a lot more flexibility and functionality than is obvious at first. Aside from being able to fail over to DB2. Slony will let you MOVE the master role from DB1 to DB2. That operation will not just make DB2 the master, but at the same time DB1 becomes a replica that doesn't need an initial sync, so your setup now would look like this: DB1 - DB2 - DB3 This is useful if you need to perform some maintenance on DB1. At this point DB2 is your master an you would just stop the slon process for DB1, do whatever you need to do, and start the slon process again. Once DB1 has caught up, you just transfer the master role back and everything is as it was. What also works is that if you need to perform maintenance on DB2, it doesn't mean that DB3 has to fall behind too. You can easily change the data provider for DB3 to be DB1, so your configuration changes to DB1 - DB2 | V DB3 At this point you stop the slon process for DB2, perform what you need to do on DB2, start the slon process and after DB2 has caught up, make it the data provider for DB3 again. Regards, Jan -- Jan Wieck Senior Software Engineer http://slony.info ___ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general
Re: [Slony1-general] Error with Slony replication from another slony cluster slave
On 10/09/2014 06:34 PM, Dave Cramer wrote: Jan, But as you said they all have to be in the same cluster, correct ? Yes. Jan Dave Cramer On 9 October 2014 18:31, Jan Wieck j...@wi3ck.info mailto:j...@wi3ck.info wrote: On 10/09/2014 05:26 PM, Glyn Astill wrote: From: Granthana Biswas granthana.bis...@gmail.com mailto:granthana.bis...@gmail.com To: Glyn Astill glynast...@yahoo.co.uk mailto:glynast...@yahoo.co.uk Sent: Thursday, 9 October 2014, 16:34 Subject: Re: [Slony1-general] Error with Slony replication from another slony cluster slave Hi Glyn, In my case I have two clusters: Cluster1 - replicating from DB1 - DB2 Cluster2 - replicating from DB1 - DB3 Can I stop Cluster2 and add DB3 to Cluster1 with DB2 as its master? Or do I have to delete the data first in DB3? You'll want to run DROP NODE against each node in Cluster2, (or if on 2.0+ you can get away with just DROP SCHEMA _Cluster2 CASCADE) and stop the slons for Cluster2. Cascading, forwarding and all that is one of the core concepts of Slony. If you create one single setup looking like this: DB1 - DB2 - DB3 then you have a lot more flexibility and functionality than is obvious at first. Aside from being able to fail over to DB2. Slony will let you MOVE the master role from DB1 to DB2. That operation will not just make DB2 the master, but at the same time DB1 becomes a replica that doesn't need an initial sync, so your setup now would look like this: DB1 - DB2 - DB3 This is useful if you need to perform some maintenance on DB1. At this point DB2 is your master an you would just stop the slon process for DB1, do whatever you need to do, and start the slon process again. Once DB1 has caught up, you just transfer the master role back and everything is as it was. What also works is that if you need to perform maintenance on DB2, it doesn't mean that DB3 has to fall behind too. You can easily change the data provider for DB3 to be DB1, so your configuration changes to DB1 - DB2 | V DB3 At this point you stop the slon process for DB2, perform what you need to do on DB2, start the slon process and after DB2 has caught up, make it the data provider for DB3 again. Regards, Jan -- Jan Wieck Senior Software Engineer http://slony.info ___ Slony1-general mailing list Slony1-general@lists.slony.info mailto:Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general -- Jan Wieck Senior Software Engineer http://slony.info ___ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general
Re: [Slony1-general] Slony ignoring cleanup_interval?
On 09/24/2014 01:00 PM, Steve Singer wrote: On 09/24/2014 12:29 PM, Christopher Browne wrote: On Fri, Sep 19, 2014 at 1:50 PM, Kristopher kristopherwil...@gmail.com mailto:kristopherwil...@gmail.com wrote: I have the following setup in my conf file: cleanup_interval=5 seconds ... However, it only actually runs cleanup about every 10 minutes (the default): We have, in fact, three parameters controlling cleanup: a) cleanup_interval, expressed as a Postgres interval, that, according to the docs, indicates: gettext_noop(A PostgreSQL value compatible with ::interval which indicates what aging interval should be used for deleting old events, and hence for purging sl_log_* tables.), (see src/slon/confoptions.c for that; I expect that there's a slon option that will make it print out the documentation strings). b) SLON_CLEANUP_SLEEP, in src/slon/slon.h, which is hardcoded to 600, indicating that every 600 seconds, the cleanup thread is called c) SLON_VACUUM_FREQUENCY, also in src/slon/slon.h, hardcoded to 3, indicating how often cleanup thread should VACUUM tables. We haven't exposed SLON_CLEANUP_SLEEP as a configuration option, and, in effect, that's a value you'd want to shorten a lot during this process. It wouldn't be a great deal of trouble to expose SLON_CLEANUP_SLEEP, and it's probably somewhat handy to do so, particularly for situations such as what you describe, where we want to avidly empty out sl_log_*. I'll see about coming up with a patch, with a view to applying this to the various major releases. My preference would be that we have 1 parameter in the config for controlling how often the cleanup thread does it's stuff. Ie make SLON_CLEANUP_SLEEP be controlled by the existing cleanup_interval field in the config. I don't understand why/when someone would want these two values to be different. I think we introduced the cleanup_interval (very easily misunderstood as something different from what it actually does) as a safeguard against race conditions, where we feared to remove replication log too soon. Unless we still fear such race condition, we should get rid of that parameter entirely and rely on confirmed by everyone is obsolete data and just purge it. The thing that really matters in this context is the cleanup sleep time, which determines how often the cleanup is actually done. Calling it often enough compared to checkpoints can actually lead to a situation where sl_log heap and index data never gets written to disk. And that is a goal well worth aiming for. Jan -- Jan Wieck Senior Software Engineer http://slony.info ___ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general
Re: [Slony1-general] Slony ignoring cleanup_interval?
On 09/19/2014 04:55 PM, Steve Singer wrote: On 09/19/2014 01:50 PM, Kristopher wrote: I have the following setup in my conf file: cleanup_interval=5 seconds sync_interval=1000 cluster_name='dbcluster01' conn_info='...' And, via the log, it looks like slony is picking up on the 5 second cleanup interval: 2014-09-15 09:31:40 EDT CONFIG main: String option pid_file = [NULL] 2014-09-15 09:31:40 EDT CONFIG main: String option archive_dir = [NULL] 2014-09-15 09:31:40 EDT CONFIG main: String option sql_on_connection = [NULL] 2014-09-15 09:31:40 EDT CONFIG main: String option lag_interval = [NULL] 2014-09-15 09:31:40 EDT CONFIG main: String option command_on_logarchive = [NULL] 2014-09-15 09:31:40 EDT CONFIG main: String option syslog_facility = LOCAL0 2014-09-15 09:31:40 EDT CONFIG main: String option syslog_ident = slon 2014-09-15 09:31:40 EDT CONFIG main: String option cleanup_interval = 5 seconds However, it only actually runs cleanup about every 10 minutes (the default): We seem to be using #define SLON_CLEANUP_SLEEP600 /* sleep 10 minutes between */ as basis to a random time to sleep between calling the cleanup process. We then pass the cleanup_interval from config file to the stored function. Maybe Jan or Chris can comment if they remember how this was intended to work with a cleanup_interval that is specified in the config file. There are two possible variables here. First the cleanup_interval that is passed to the cleanup stored procedure, controlling how old an event must be at minimum to be purged from sl_event. Second the frequency at which the cleanup stored procedure is invoked by the slon daemon. All or none of this could currently be screwed up because of being miscommunicated between Chris and me. Jan For purposes of your upgrade you want to make sure that sl_log_1 and sl_log_2 have no rows in them then you can upgrade. NOTICE: Slony-I: log switch to sl_log_1 complete - truncate sl_log_2 CONTEXT: PL/pgSQL function _dbcluster01.cleanupevent(interval) line 94 at assignment 2014-09-18 06:28:40 EDT INFO cleanupThread:0.368 seconds for cleanupEvent() NOTICE: Slony-I: Logswitch to sl_log_2 initiated CONTEXT: SQL statement SELECT _dbcluster01.logswitch_start() PL/pgSQL function _dbcluster01.cleanupevent(interval) line 96 at PERFORM 2014-09-18 06:39:27 EDT INFO cleanupThread:0.018 seconds for cleanupEvent() NOTICE: Slony-I: log switch to sl_log_2 complete - truncate sl_log_1 CONTEXT: PL/pgSQL function _dbcluster01.cleanupevent(interval) line 94 at assignment 2014-09-18 06:51:10 EDT INFO cleanupThread:0.106 seconds for cleanupEvent() 2014-09-18 06:51:10 EDT INFO cleanupThread:0.006 seconds for vacuuming NOTICE: Slony-I: Logswitch to sl_log_1 initiated CONTEXT: SQL statement SELECT _dbcluster01.logswitch_start() PL/pgSQL function _dbcluster01.cleanupevent(interval) line 96 at PERFORM 2014-09-18 07:01:52 EDT INFO cleanupThread:0.016 seconds for cleanupEvent() NOTICE: Slony-I: log switch to sl_log_1 complete - truncate sl_log_2 CONTEXT: PL/pgSQL function _dbcluster01.cleanupevent(interval) line 94 at assignment 2014-09-18 07:13:33 EDT INFO cleanupThread:0.110 seconds for cleanupEvent() NOTICE: Slony-I: Logswitch to sl_log_2 initiated CONTEXT: SQL statement SELECT _dbcluster01.logswitch_start() PL/pgSQL function _dbcluster01.cleanupevent(interval) line 96 at PERFORM 2014-09-18 07:24:46 EDT INFO cleanupThread:0.014 seconds for cleanupEvent() 2014-09-18 07:24:46 EDT INFO cleanupThread:0.006 seconds for vacuuming NOTICE: Slony-I: log switch to sl_log_2 complete - truncate sl_log_1 CONTEXT: PL/pgSQL function _dbcluster01.cleanupevent(interval) line 94 at assignment 2014-09-18 07:36:17 EDT INFO cleanupThread:0.114 seconds for cleanupEvent() NOTICE: Slony-I: Logswitch to sl_log_1 initiated CONTEXT: SQL statement SELECT _dbcluster01.logswitch_start() PL/pgSQL function _dbcluster01.cleanupevent(interval) line 96 at PERFORM 2014-09-18 07:47:31 EDT INFO cleanupThread:0.018 seconds for cleanupEvent() In order to perform the upgrade to the latest version, it says I need to set this interval low to clear up any data in the log tables, but I'm stuck trying to get that to happen. Any ideas? Kristopher ___ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general ___ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general -- Jan Wieck Senior Software Engineer http://slony.info ___ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general
Re: [Slony1-general] Slony-I Upgrade
On 08/25/2014 02:58 AM, Venkata Balaji N wrote: Upgrading from 2.0.3 to 2.2.3 works fine. Yes, there should be no difference between upgrading from 2.0 or 2.1 to 2.2. The thing to keep in mind is that with 2.2 the structure of the sl_log tables changes and that it is absolutely necessary that the application is stopped and all subscribers are fully caught up before starting the upgrade process. Outstanding backlog is not being converted and those updates would be lost otherwise. Regards, Jan Thanks, VBN On Mon, Aug 25, 2014 at 11:56 AM, Soni M diptat...@gmail.com mailto:diptat...@gmail.com wrote: I never try the way You proposed. I think You better upgrade it with smaller steps, (i.e 2.0 to 2.1 then 2.1 to 2.2) as described here : http://slony.info/documentation/2.2/slonyupgrade.html On Sat, Aug 23, 2014 at 4:30 PM, Venkata Balaji N nag1...@gmail.com mailto:nag1...@gmail.com wrote: Hello All, We are upgrading our existing Slony-I installations across our production servers. We are upgrading from version 2.0.3 to version 2.2.3. We will have our Applications down for an other scheduled activity Below is our upgrade procedure - * Install Slony-I-2.2.3 on all the nodes * Issue SLONIK_UPDATE_NODES -c config file from any of the nodes (we do it from our master node) * Start the slon processes for all the nodes Will this be OK to upgrade from 2.0.3 to 2.2.3 ? Please help us know if we have to do anything else. Regards, VBN ___ Slony1-general mailing list Slony1-general@lists.slony.info mailto:Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general -- Regards, Soni Maula Harriz ___ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general -- Jan Wieck Senior Software Engineer http://slony.info ___ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general
Re: [Slony1-general] slony 2.2.3 experiences
On 07/17/14 17:45, Ger Timmens wrote: Hi all, After upgrading our environment from slony 2.1.4 to slony 2.2.3 we see 'a lot' more slony connections both on master, forwarders as subscribers. E.g. on the master we see approx 120 more 'idle' slony connections. This connections start around 50, groing to 120 over time (restarting slons, will bring it down and connections will increase again). The number of connections should not go up like that. How does your cluster configuration look like? 50 connection to start with sounds already high. Are there any error messages in the slon logs? What I can imagine is a connection leak in some unusual error situation. Slon reconnects but never closes the old connection. This is probably related to the changed failover logic in slony 2.2.x over 2.1.x where each forwarding note should know about each other node ? Not sure about that yet. Regards, Jan -- Jan Wieck Senior Software Engineer http://slony.info ___ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general
Re: [Slony1-general] sl_log_x very large due to lagging subscription
I guess you mean either UNSUBSCRIBE SET or DROP NODE. What should happen in the UNSUBSCRIBE case is that the node quickly confirms all the outstanding events since they no longer require replicating any data, at which point all providers can perform/finish log switching again. In the case of DROP NODE the behavior is similar. Make sure that ALL remaining nodes in the cluster have forgotten everything about the dropped node before attempting to rebuild it. Or rebuild it with a different node ID. If this is your only node, then you could as well uninstall the whole cluster and start from scratch. -- Jan Wieck Senior Software Engineer http://slony.info On Jun 27, 2014 4:04 PM, Dave Cramer davecra...@gmail.com wrote: If I drop the subscription through drop set will the log file be automagically cleaned up ? If not how can I fix it ? Dave Cramer ___ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general ___ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general
Re: [Slony1-general] sl_log_x very large due to lagging subscription
On Jun 27, 2014 7:44 PM, Dave Cramer davecra...@gmail.com wrote: On 27 June 2014 16:55, Jan Wieck j...@wi3ck.info wrote: I guess you mean either UNSUBSCRIBE SET or DROP NODE. What should happen in the UNSUBSCRIBE case is that the node quickly confirms all the outstanding events since they no longer require replicating any data, at which point all providers can perform/finish log switching again. In the case of DROP NODE the behavior is similar. Make sure that ALL remaining nodes in the cluster have forgotten everything about the dropped node before attempting to rebuild it. Or rebuild it with a different node ID. If this is your only node, then you could as well uninstall the whole cluster and start from scratch. Yes I did mean UNSUBSCRIBE SET. And I forgot to mention this is slony 1.2 I forgive you. Also as is usually the case it is not as simple as I have first posed. For reasons that I won't elaborate the lagging subscription is a duplicate of one that is not lagging. Since you (apparently) already gave up on that lagging node, dropping it should work just as well, or even better. That said ... One of the sl_log_x files is 16G in size. Suggestions are welcome ... would it be possible to take a brief application outage of 10 minutes or so? If yes, then you could stop the application and wait for the healthy node to catch up 100% (slonik script doing a SYNC and WAIT FOR EVENT). Now you stop all slon processes and remove Slony via UNINSTALL NODE commands. Install Slony 2.2 and rebuild the cluster. The formerly healthy node is getting subscribed with OMIT COPY, so it doesn't copy any data. Since you now have a replica you can start the application and let the third node rebuild with the regular copy. I know that this doesn't sound too cozy at first, but we have done this UNINSTALL and rebuild with OMIT COPY several times to our production environment (for reasons I now won't elaborate on). Bottom line is that it works reliably. We can work through this in more detail on the phone and through other media if you like. Jan Dave Cramer www.credativ.ca On Jun 27, 2014 4:04 PM, Dave Cramer davecra...@gmail.com wrote: If I drop the subscription through drop set will the log file be automagically cleaned up ? If not how can I fix it ? Dave Cramer ___ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general -- Jan Wieck Senior Software Engineer http://slony.info ___ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general
Re: [Slony1-general] upgrading from 1.x to 2.x
On 06/18/14 15:59, Dave Cramer wrote: What does this ominous message mean When Slony-I uninstalls itself, catalog corruptions are fixed back up. Slony 1.x was designed to run on PostgreSQL versions prior to 8.3. Since the session_replication_role and trigger configuration features were new in 8.3, Slony 1.x disabled triggers and rewrite rules by hacking pg_class on replicas. When you use UNINSTALL NODE, those hacks are reverted and you end up with a clean system catalog again on that replica. Jan -- Jan Wieck Senior Software Engineer http://slony.info ___ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general
Re: [Slony1-general] Replication fails on Corrupted PK index
On 04/28/14 08:01, Steve Singer wrote: On 04/28/2014 05:19 AM, Raghav wrote: postgres=# insert into dtest values (1,'D'); INSERT 0 1 postgres=# insert into dtest values (1,'D'); But your indexes will also prevent something like: * Stop your slons insert into dtest values (1,'D'); delete from dtest; insert into dtest values (1,'D'); which should be allowed. That is problem number 1). 2) It will still allow something like insert into dtest values (1, 'D', 'foo'); insert into dtest values (1, 'D', 'bar'); since the index isn't restricted to the primary key columns of the table. And since it is shared by all user tables, that's impossible. 3) It will also not prevent something like this: insert into dtest values (1, 'D'); -- wait for 30 minutes in which slony cleanup switches logs and -- the index corruption happens insert into dtest values (1, 'D'); In summary, attempting to guard against possible PostgreSQL bugs that corrupt data is futile. Even if we could figure this one out, the next problem would be to guard against a corruption of the sl_log_N tables. Regards, Jan -- Jan Wieck Senior Software Engineer http://slony.info ___ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general
Re: [Slony1-general] Slony setup error
On 04/17/14 05:20, Glyn Astill wrote: Well have you checked to see if the _replication schema and getlocalnodeid function are present on your new subscriber? From the error message I can only assume you've not run STORE NODE against the node you're trying to subscribe, That is what is most likely. p.s. I'll use this as a test to see if I can still post to the list from my yahoo.com address too; I've not much hope it will :-( Did @yahoo.co.uk receive the same ill advised changes? Not sure because so far I don't see any bounce messages. Jan -- Jan Wieck Senior Software Engineer http://slony.info ___ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general
Re: [Slony1-general] Slony setup error
On 04/17/14 12:50, Samir Parikh wrote: Thanks for the quick response. Yes, you are right function getLocalNodeId does not exist on subscriber node but it does exists on primary node. On further check I could see on primary node under replication schema total 127 functions but in subscriber node I could see only 112. Not sure why there are missing functions which include getLocalNodeId. This is the very first time I am trying to set up slony replication. I am following documentation on slony.info. That sounds very odd. I would suggest doing an UNINSTALL NODE, DROP NODE (both for the non-functioning one), waiting until all traces of the dropped node are gone from the system (or using a different node ID to be sure) and then start over from STORE NODE, STORE PATH all the way. And make sure you use 2.2 or at least 2.1. 2.0 is marked end of life and earlier versions have not received any updates/fixes for quite a while now. Regards, Jan Thanks again. Samir -Original Message- From: Glyn Astill [mailto:glynast...@yahoo.co.uk] Sent: Thursday, April 17, 2014 2:20 AM To: Samir Parikh; slony1-general@lists.slony.info Subject: Re: [Slony1-general] Slony setup error From: Samir Parikh spar...@carcharging.com To: slony1-general@lists.slony.info slony1-general@lists.slony.info Sent: Thursday, 17 April 2014, 2:06 Subject: [Slony1-general] Slony setup error I am getting following error while trying to set up replication ? Can somebody help ? ./slonik_subscribe_set 1 2 | ./slonik stdin:5: PGRES_FATAL_ERROR select max(ev_seqno) FROM _replication.sl_event , _replication.sl_node where ev_origin=_replication.getLocalNodeId('_replication') AND ev_type 'SYNC' AND sl_node.no_id= ev_origin - ERROR: function _replication.getlocalnodeid(unknown) does not exist LINE 1: ...l_event , _replication.sl_node where ev_origin=_replicat... ^ HINT: No function matches the given name and argument types. You might need to add explicit type casts. error: unable to query event history on node 2 Well have you checked to see if the _replication schema and getlocalnodeid function are present on your new subscriber? From the error message I can only assume you've not run STORE NODE against the node you're trying to subscribe, waiting for events (2,221473472232) only at (2,0) to be confirmed on node 1 This reinforces my thoughts, as well as running STORE NODE/PATH for your new subscriber you need to make sure you have a slon running against it too. Perhaps you could fill us in on the steps you've taken so far so we can see where you've gone off course. Glyn. p.s. I'll use this as a test to see if I can still post to the list from my yahoo.com address too; I've not much hope it will :-( ___ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general -- Jan Wieck Senior Software Engineer http://slony.info ___ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general
Re: [Slony1-general] Slony setup error
On 04/17/14 14:51, Samir Parikh wrote: Some success. I could now set up the replication after dropping/recreating the replication schema on both nodes. Now I am trying to add tables to set using slonik script and stuck with this syntax error. I googled it and found many instances of this error but could not find any solution that work for me. stdin:1: ERROR: syntax error at or near replication [root@vmbndbdev01 ~]# slonik _EOF_ cluster name = 'replication' _EOF_ stdin:1: ERROR: syntax error at or near replication The bison parser for this isn't the best in the world. It expects at least one admin conninfo and one actual command too. Otherwise it is not considered a complete script and in bison syntax error is pretty much all you get. Jan I tried with single,double and no quotes, but not luck with either of those. Thanks much. Samir -Original Message- From: Jan Wieck [mailto:j...@wi3ck.info] Sent: Thursday, April 17, 2014 10:30 AM To: Samir Parikh; Glyn Astill; slony1-general@lists.slony.info Subject: Re: [Slony1-general] Slony setup error On 04/17/14 12:50, Samir Parikh wrote: Thanks for the quick response. Yes, you are right function getLocalNodeId does not exist on subscriber node but it does exists on primary node. On further check I could see on primary node under replication schema total 127 functions but in subscriber node I could see only 112. Not sure why there are missing functions which include getLocalNodeId. This is the very first time I am trying to set up slony replication. I am following documentation on slony.info. That sounds very odd. I would suggest doing an UNINSTALL NODE, DROP NODE (both for the non-functioning one), waiting until all traces of the dropped node are gone from the system (or using a different node ID to be sure) and then start over from STORE NODE, STORE PATH all the way. And make sure you use 2.2 or at least 2.1. 2.0 is marked end of life and earlier versions have not received any updates/fixes for quite a while now. Regards, Jan Thanks again. Samir -Original Message- From: Glyn Astill [mailto:glynast...@yahoo.co.uk] Sent: Thursday, April 17, 2014 2:20 AM To: Samir Parikh; slony1-general@lists.slony.info Subject: Re: [Slony1-general] Slony setup error From: Samir Parikh spar...@carcharging.com To: slony1-general@lists.slony.info slony1-general@lists.slony.info Sent: Thursday, 17 April 2014, 2:06 Subject: [Slony1-general] Slony setup error I am getting following error while trying to set up replication ? Can somebody help ? ./slonik_subscribe_set 1 2 | ./slonik stdin:5: PGRES_FATAL_ERROR select max(ev_seqno) FROM _replication.sl_event , _replication.sl_node where ev_origin=_replication.getLocalNodeId('_replication') AND ev_type 'SYNC' AND sl_node.no_id= ev_origin - ERROR: function _replication.getlocalnodeid(unknown) does not exist LINE 1: ...l_event , _replication.sl_node where ev_origin=_replicat... ^ HINT: No function matches the given name and argument types. You might need to add explicit type casts. error: unable to query event history on node 2 Well have you checked to see if the _replication schema and getlocalnodeid function are present on your new subscriber? From the error message I can only assume you've not run STORE NODE against the node you're trying to subscribe, waiting for events (2,221473472232) only at (2,0) to be confirmed on node 1 This reinforces my thoughts, as well as running STORE NODE/PATH for your new subscriber you need to make sure you have a slon running against it too. Perhaps you could fill us in on the steps you've taken so far so we can see where you've gone off course. Glyn. p.s. I'll use this as a test to see if I can still post to the list from my yahoo.com address too; I've not much hope it will :-( ___ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general -- Jan Wieck Senior Software Engineer http://slony.info ___ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general
Re: [Slony1-general] Slony setup error
On 04/17/14 14:56, Jan Wieck wrote: On 04/17/14 14:51, Samir Parikh wrote: Some success. I could now set up the replication after dropping/recreating the replication schema on both nodes. Now I am trying to add tables to set using slonik script and stuck with this syntax error. I googled it and found many instances of this error but could not find any solution that work for me. stdin:1: ERROR: syntax error at or near replication [root@vmbndbdev01 ~]# slonik _EOF_ cluster name = 'replication' _EOF_ stdin:1: ERROR: syntax error at or near replication The bison parser for this isn't the best in the world. It expects at least one admin conninfo and one actual command too. Otherwise it is not considered a complete script and in bison syntax error is pretty much all you get. Plus you need a semicolon at the end of course. Jan Jan I tried with single,double and no quotes, but not luck with either of those. Thanks much. Samir -Original Message- From: Jan Wieck [mailto:j...@wi3ck.info] Sent: Thursday, April 17, 2014 10:30 AM To: Samir Parikh; Glyn Astill; slony1-general@lists.slony.info Subject: Re: [Slony1-general] Slony setup error On 04/17/14 12:50, Samir Parikh wrote: Thanks for the quick response. Yes, you are right function getLocalNodeId does not exist on subscriber node but it does exists on primary node. On further check I could see on primary node under replication schema total 127 functions but in subscriber node I could see only 112. Not sure why there are missing functions which include getLocalNodeId. This is the very first time I am trying to set up slony replication. I am following documentation on slony.info. That sounds very odd. I would suggest doing an UNINSTALL NODE, DROP NODE (both for the non-functioning one), waiting until all traces of the dropped node are gone from the system (or using a different node ID to be sure) and then start over from STORE NODE, STORE PATH all the way. And make sure you use 2.2 or at least 2.1. 2.0 is marked end of life and earlier versions have not received any updates/fixes for quite a while now. Regards, Jan Thanks again. Samir -Original Message- From: Glyn Astill [mailto:glynast...@yahoo.co.uk] Sent: Thursday, April 17, 2014 2:20 AM To: Samir Parikh; slony1-general@lists.slony.info Subject: Re: [Slony1-general] Slony setup error From: Samir Parikh spar...@carcharging.com To: slony1-general@lists.slony.info slony1-general@lists.slony.info Sent: Thursday, 17 April 2014, 2:06 Subject: [Slony1-general] Slony setup error I am getting following error while trying to set up replication ? Can somebody help ? ./slonik_subscribe_set 1 2 | ./slonik stdin:5: PGRES_FATAL_ERROR select max(ev_seqno) FROM _replication.sl_event , _replication.sl_node where ev_origin=_replication.getLocalNodeId('_replication') AND ev_type 'SYNC' AND sl_node.no_id= ev_origin - ERROR: function _replication.getlocalnodeid(unknown) does not exist LINE 1: ...l_event , _replication.sl_node where ev_origin=_replicat... ^ HINT: No function matches the given name and argument types. You might need to add explicit type casts. error: unable to query event history on node 2 Well have you checked to see if the _replication schema and getlocalnodeid function are present on your new subscriber? From the error message I can only assume you've not run STORE NODE against the node you're trying to subscribe, waiting for events (2,221473472232) only at (2,0) to be confirmed on node 1 This reinforces my thoughts, as well as running STORE NODE/PATH for your new subscriber you need to make sure you have a slon running against it too. Perhaps you could fill us in on the steps you've taken so far so we can see where you've gone off course. Glyn. p.s. I'll use this as a test to see if I can still post to the list from my yahoo.com address too; I've not much hope it will :-( ___ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general -- Jan Wieck Senior Software Engineer http://slony.info ___ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general
Re: [Slony1-general] Wide Area Replication, Add Set, without index?
On 02/14/14 15:26, Tory M Blue wrote: On Fri, Feb 14, 2014 at 10:35 AM, Vick Khera vi...@khera.org mailto:vi...@khera.org wrote: On Thu, Feb 13, 2014 at 5:15 PM, Tory M Blue tmb...@gmail.com mailto:tmb...@gmail.com wrote: so I'm wondering how do I change my scripts to create the index's at the end? I obviously can't drop the index on the master or everything will go to heck in a hand basket, but is there an instruction set when I'm adding a node and adding sets to that node, to tell it to ignore the indexes? I'm pretty sure slony disables indexes while copying, then re-builds them after it is done. Actually I believe this is beyond slony. The schema tells postgres what to do, so if it's creating indexes when the table is created, seems like everything pauses / waits for the index to be done, before moving on to the next table copy. So I think editing the schema to do the indexes at the end or not at all (make it manual), one should be able to use that schema on a remote slon host, that host will not pause to create indexes, and allow slony to push the data without a significant pause (some of our indexes have taken 4+ hours to create). If you do a SUBSCRIBE SET without specifying OMIT COPY = yes, then Slony is doing: truncate the table turn off indexes (mucking with system catalog here) copy table turn indexes back on reindex table All this is done within one transaction for all tables in the set, so in this case mucking with system catalog is safe, because it will never be visible to any concurrent session. You could drop all indexes (and the primary key) from all the tables on the subscriber, do the subscribe, then kill the slon (because without the PK it won't be able to do much in terms of catching up), recreate PK and indexes, then let it catch up. What I would rather suggest is that you enable TCP keepalives and configure them in the slon config so that your firewall/NAT-gateway or whatever is reaping the idle connections is kept from doing so. I'm going to test this anyways, I don't see anything particular to slon to stop this behaviour TCP keepalives should do that, as a side effect. Regards, Jan -- Jan Wieck Senior Software Engineer http://slony.info ___ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general
Re: [Slony1-general] Administrivia: DMARC, Yahoo
On 04/15/14 11:58, Andrew Sullivan wrote: On Tue, Apr 15, 2014 at 10:57:37AM -0400, Christopher Browne wrote: Here is an announcement of a block Yahoo users policy just implemented by another free software project ( http://lists.roaringpenguin.com/pipermail/remind-fans/2014/003025.html) which seems to sum the issue up briefly but accurately. One alternative to this is to rewrite the From: address of Yahoo users when they enter the list, adding .invalid to the end of the mail address. The resulting lookup for the DMARC record then looks in (say) yahoo.com.invalid. You can't respond to such users, of course, and it may cause a lot of bounces, but fortunately .invalid is guaranteed not to work anywhere so at least the DNS lookups will be quick. I am not in favor of shifting bounces around by deliberately creating invalid header content in order to make posting from Yahoo! addresses possible again. I am in favor of following exactly the same route and block all mail from @yahoo.com (and other DMARC participating ISPs) on our lists. I do encourage people who are using yahoo and who want to use mailing lists to ditch the yahoo account, however. I also urge everyone to avoid yahoo as much as possible, because I think this behaviour was irresponsible and hostile to the Internet, and deserves to be shunned. Ditto. One aspect of the whole thing is that the DMARC proposal is two years old and it was well known that this (breaking mailing lists) would be a side effect of it. The powers that be at Yahoo! went ahead with it anyways. This can mean only one thing. That Yahoo! does not want users with a @yahoo.com address to participate in third party mailing lists. If that is what they want, then that is what they should get. Regards, Jan -- Jan Wieck Senior Software Engineer http://slony.info ___ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general
[Slony1-general] Yahoo DMARC test
Let's see -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin ___ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general
Re: [Slony1-general] Still having issues with wide area replication. large table , copy set 2 failed
On 02/16/14 20:06, Jeff Frost wrote: On Feb 16, 2014, at 5:00 PM, Tory M Blue tmb...@gmail.com wrote: As can be seen the connection is reaped, slon/postgres continue on their way, it's not until the next data copy is required that it finds it's connection is no longer there. Why it can't recreate a conneciton as one would do if they stopped and started slon is kind of beyond me. Just not 100% sure where it's being killed. Because the initial sync must be done as a single transaction. 2014-02-16 16:40:46 PST CONFIG remoteWorkerThread_1: 7183.069 seconds to copy table tracking.spotlightimp 2014-02-16 16:40:46 PST CONFIG remoteWorkerThread_1: copy table tracking.adimp 2014-02-16 16:40:46 PST CONFIG remoteWorkerThread_1: Begin COPY of table tracking.adimp 2014-02-16 16:40:46 PST ERROR remoteWorkerThread_1: select _cls.copyFields(19); 2014-02-16 16:40:46 PST WARN remoteWorkerThread_1: data copy for set 2 failed 1 times - sleep 15 seconds NOTICE: Slony-I: Logswitch to sl_log_2 initiated CONTEXT: SQL statement SELECT _cls.logswitch_start() PL/pgSQL function _cls.cleanupevent(interval) line 96 at PERFORM 2014-02-16 16:40:49 PST INFO cleanupThread: 6541.365 seconds for cleanupEvent() Am I doing this wrong? figured that since I've seen connections at 15 minutes of processing complete fine, I thought that 30 minutes is more then enough. So send the first hey are you still there at 15 minutes then continue with them every 5 minutes, for a count of 30. But the above seems to have been reaped in the 20 minute area.. net.ipv4.tcp_keepalive_time = 600 net.ipv4.tcp_keepalive_probes = 30 net.ipv4.tcp_keepalive_intvl = 300 Set it so that it's sending keepalives every 30 seconds. Something like this: net.ipv4.tcp_keepalive_time = 30 net.ipv4.tcp_keepalive_probes = 10 net.ipv4.tcp_keepalive_intvl = 30 Jeff is right. Really understanding these values may help too. Since Slony allows to set them specifically for Slony in its config file, that is where it really should be done, rather than setting the global values in the kernel. Those kernel values should be adjusted to more appropriate values than suitable for a link to the Moon too, but that's another story. tcp_keepalive_time is the number of seconds, the kernel waits since the last transmission on a socket, before starting to probe. In this case, the end of the COPY, when slony is going into the long IDLE in transaction while building the indexes, is the start of that timer. tcp_keepalive_intvl is the interval between keepalive packets. But that interval only kicks in once the tcp_keepalive_time has elapsed. tcp_keepalive_probes is the number of keepalive packets, that need to be missing in a row, before the connection is considered lost, resulting in a connection reset by peer. One single keepalive received resets the whole thing. I actually go a lot more aggressive at this. Something like tcp_keepalive_time = 5 tcp_keepalive_probes = 24 tcp_keepalive_intvl = 5 Pretty much all remote slony connections do something every couple of seconds. And a link that cannot deal with a few keepalive packets per minute is not suitable for running slony over anyway. And who cares about wasting bandwidth on an idle link? Seriously! It's not like we are paying by the kilobyte of used bandwidth these days, are we? What is also important here that my above settings lead to a timeout and canceling of the PostgreSQL backend within 2 minutes. There is something really bad about hanging backends of lost Slony connections, when you actually ever need to failover. You really want them to time out in a timely fashion. Trust me. Regards, Jan -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin ___ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general
Re: [Slony1-general] Sync stopped with slony-2.2.1 on data type mismatch
On 01/29/14 17:26, Hanselman, Matthew wrote: One follow-up: Do you think 2.1.4 is stable in this regard? I have actually found the bug in 2.2.STABLE that causes this. As Steve assumed it is a memory corruption inside the log apply trigger. 2.1.x does not have this bug since it doesn't have an apply trigger at all, but issues individual SQL statements against the updated tables directly. The bug can be avoided by setting the apply_cache_size in the slon config file higher than the number of possible individual apply query plans during a single SYNC event. This may bloat the memory size of the backend, used by slon to apply changes, but better that than anything else. Note that the apply cache size defaults to 50, which is a VERY low number. Using 1,000 or even 10,000 should not cause a modern DB server to run out of memory. There can only ever be one DB connection per node that creates these plans (the local slon's connection). You should be good bumping that number instead of downgrading to 2.1. We will have a Slony 2.2.2 release out really soon. Sorry for the inconvenience. Regards, Jan Thanks! - Matt -Original Message- From: Jan Wieck [mailto:janwi...@yahoo.com] Sent: Friday, January 24, 2014 11:36 AM To: Hanselman, Matthew; Steve Singer Cc: slony1-general@lists.slony.info Subject: Re: [Slony1-general] Sync stopped with slony-2.2.1 on data type mismatch On 01/24/14 10:03, Hanselman, Matthew wrote: This looks very similar. I'll leave a comment to this effect on the bug. Thanks! For what it's worth, I tried restarting the slon daemon on both the master slave. I got a very similar error, but this time it was complaining that Lost Angeles wasn't an integer. That does indeed look like memory corruption. I will take a very close look at the apply trigger code. In the meantime it may mask this error to increase the size of the apply query cache in the slon conf. Thanks, Jan - Matt -Original Message- From: Steve Singer [mailto:ssin...@ca.afilias.info] Sent: Friday, January 24, 2014 9:59 AM To: Jan Wieck Cc: Hanselman, Matthew; slony1-general@lists.slony.info Subject: Re: [Slony1-general] Sync stopped with slony-2.2.1 on data type mismatch On 01/23/2014 03:42 PM, Jan Wieck wrote: On 01/23/14 15:31, Hanselman, Matthew wrote: Running Slony 2.2.1 with Postgres 9.2.6. The slave is getting errors trying to apply a change with this error: 2014-01-23 14:55:58 EST ERROR remoteWorkerThread_1_1: error at end of COPY IN: ERROR: invalid input syntax for integer: 33.77269118 Could this be the same thing as that is causing http://bugs.slony.info/bugzilla/show_bug.cgi?id=327 (We suspect memory corruption in the apply trigger) The column in question is a double precision on both master slave. No schema change has happened since Slony was deployed. What did happen between slony was deployed and this breakage? Jan Is this a bug in Slony, or is there something else I can do to troubleshoot? - Matt ___ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin ___ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general
Re: [Slony1-general] Sync stopped with slony-2.2.1 on data type mismatch
On 01/24/14 10:03, Hanselman, Matthew wrote: This looks very similar. I'll leave a comment to this effect on the bug. Thanks! For what it's worth, I tried restarting the slon daemon on both the master slave. I got a very similar error, but this time it was complaining that Lost Angeles wasn't an integer. That does indeed look like memory corruption. I will take a very close look at the apply trigger code. In the meantime it may mask this error to increase the size of the apply query cache in the slon conf. Thanks, Jan - Matt -Original Message- From: Steve Singer [mailto:ssin...@ca.afilias.info] Sent: Friday, January 24, 2014 9:59 AM To: Jan Wieck Cc: Hanselman, Matthew; slony1-general@lists.slony.info Subject: Re: [Slony1-general] Sync stopped with slony-2.2.1 on data type mismatch On 01/23/2014 03:42 PM, Jan Wieck wrote: On 01/23/14 15:31, Hanselman, Matthew wrote: Running Slony 2.2.1 with Postgres 9.2.6. The slave is getting errors trying to apply a change with this error: 2014-01-23 14:55:58 EST ERROR remoteWorkerThread_1_1: error at end of COPY IN: ERROR: invalid input syntax for integer: 33.77269118 Could this be the same thing as that is causing http://bugs.slony.info/bugzilla/show_bug.cgi?id=327 (We suspect memory corruption in the apply trigger) The column in question is a double precision on both master slave. No schema change has happened since Slony was deployed. What did happen between slony was deployed and this breakage? Jan Is this a bug in Slony, or is there something else I can do to troubleshoot? - Matt ___ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin ___ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general
Re: [Slony1-general] Sync stopped with slony-2.2.1 on data type mismatch
On 01/23/14 15:31, Hanselman, Matthew wrote: Running Slony 2.2.1 with Postgres 9.2.6. The slave is getting errors trying to apply a change with this error: 2014-01-23 14:55:58 EST ERROR remoteWorkerThread_1_1: error at end of COPY IN: ERROR: invalid input syntax for integer: 33.77269118 The column in question is a “double precision” on both master slave. No schema change has happened since Slony was deployed. What did happen between slony was deployed and this breakage? Jan Is this a bug in Slony, or is there something else I can do to troubleshoot? - Matt ___ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin ___ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general
Re: [Slony1-general] Slony getting 'hung up' on an event?
On 01/06/14 13:51, Brian Fehrle wrote: ... 2014-01-06 11:43:22 MST DEBUG1 remoteHelperThread_1_1: 112.859 seconds delay for first row 2014-01-06 11:43:22 MST DEBUG1 remoteHelperThread_1_1: 112.906 seconds until close cursor 2014-01-06 11:43:22 MST DEBUG1 remoteHelperThread_1_1: inserts=61 updates=300 deletes=55 truncates=0 Almost 2 minutes to select 416 log rows? That looks wrong. Can you change the slon configuration to have an explain_interval? With that slon will emit EXPLAIN output for the log selection query in that interval. Maybe something is screwing up the optimizer. Jan -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin ___ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general
Re: [Slony1-general] query running slow in slony replication cluster environment
On 01/03/14 23:33, ammu narasimham wrote: Hello, I have configured the replication cluster using slony on Postgres-9.2 successfully. It is 2-node cluster(Master-slave). When i tried to insert 100k rows in a table, the execution time is nearly doubled in the new environment. Since each INSERT causes a second INSERT from the Slony log trigger into the replication log, this is normal. Is this (inserting 100,000 rows at once) something, your application does on a regular base, or is this some artificial, unrealistic test, you thought would give you any useful information? Jan -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin ___ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general
Re: [Slony1-general] 6 hours to replicate single table, 12 hours to replicate DB
On 01/02/14 16:12, Tory M Blue wrote: We are getting outside of slon and into postgres and i'll try to keep this light, here are my postgresql custom config aspects Postgres and its architecture very much affect the behavior and performance of Slony. I'm afraid that they cannot be viewed completely isolate from each other. The hardware is being refreshed but right now it's an 8 core, 32GB CentOS system, being replaced with some pretty big hardware 256GB/32cores many SSD's. But tuning this now and knowing where I need to tweak when I add the added capacity would be great. The slon process uses one single database connection to the data provider for the copy operation and another single database connection to feed the data into the new replica. A single Postgres session only utilizes one single core. It therefore makes zero difference if you are upgrading from a dual core to 64 cores or more. The suggestions made about work mem and maintenance work mem are valid. If you are using a dedicated DB user for all your slon database connections you can tweak these per server via ALTER USER. Regards, Jan -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin ___ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general
Re: [Slony1-general] 6 hours to replicate single table, 12 hours to replicate DB
On 01/03/14 14:42, Tory M Blue wrote: On Fri, Jan 3, 2014 at 11:33 AM, Jan Wieck janwi...@yahoo.com mailto:janwi...@yahoo.com wrote: On 01/02/14 16:12, Tory M Blue wrote: We are getting outside of slon and into postgres and i'll try to keep this light, here are my postgresql custom config aspects Postgres and its architecture very much affect the behavior and performance of Slony. I'm afraid that they cannot be viewed completely isolate from each other. The hardware is being refreshed but right now it's an 8 core, 32GB CentOS system, being replaced with some pretty big hardware 256GB/32cores many SSD's. But tuning this now and knowing where I need to tweak when I add the added capacity would be great. The slon process uses one single database connection to the data provider for the copy operation and another single database connection to feed the data into the new replica. A single Postgres session only utilizes one single core. It therefore makes zero difference if you are upgrading from a dual core to 64 cores or more. The suggestions made about work mem and maintenance work mem are valid. If you are using a dedicated DB user for all your slon database connections you can tweak these per server via ALTER USER. Regards, Jan Okay that is interesting.. If you are using a dedicated DB user for all your slon database connections, you can tweak these per server via Alter User, what will that do for me? We do use the same user across the board and i figured I would just modify my postgresql.conf. So i'm a tad confused :) Assuming you have a user slon and all your slon processes use that DB user to do their work. You connect with psql to a Postgres server as a superuser (postgres for example) and issue the command ALTER USER slon SET work_mem TO 2048MB; From now on, whenever user slon is connecting to a database under that postmaster, it will have 2GB of work_mem. No matter what the default setting in your postgresql.conf file is. This will not affect any other user. Since these settings (work_mem and maintenance_work_mem) are limits per database backend, you don't want to set them too high in the global postgresql.conf file. You'd risk running into swapping. But certain maintenance jobs as well as slon (when doing subscriptions) do need rather unusual amounts of resources. Regards, Jan -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin ___ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general
Re: [Slony1-general] Problem slony 2.2 and pg 9.3 double replication
On 10/16/2013 09:55 AM, Sebastien Marchand wrote: Sorry I forgot to tell you that I have 2 differents replications . SERVER 1 SERVER 2 SCHEMA A ---SCHEMA A (CLUSTER NAME = repli_general ) SCHEMA B ---SCHEMA B (CLUSTER NAME = repli_local110 ) This does not require two clusters with 4 slon daemons. Like Steve already wrote you only need two sets. Jan Now one script i use (slony-ctl ) : #!/bin/bash ## EN: Slonik script for initialising replication /usr/pgsql-9.3/bin/slonik _EOF_ ## EN: Preamble file contains all useful connexions information CLUSTER NAME = repli_local110 ; NODE 101 ADMIN CONNINFO = 'dbname=RPP2 host=192.168.0.101 port=5432 user=slony password=password'; NODE 110 ADMIN CONNINFO = 'dbname=RPP2 host=192.168.0.110 port=5432 user=slony password=password'; #-- # Init the first node. This creates the schema _repli_local110 # containing all replication system specific database objects. #-- try{ init cluster ( id=101, comment = 'repli_local110 : 101 - DATABASE - 192.168.0.101'); } on error { echo 'Cluster 101 : ko' ; exit 1; } on success { echo 'Cluster 101 : OK' ; } # The other nodes. #-- # Create the node 110 (slave) and tell the 2 nodes how to connect to # each other and how they should listen for events. #-- try{ store node (id = 110, EVENT NODE = 101, comment = 'DB DATABASE - Host 192.168.0.110 - Cluster repli_local110'); } on error { echo 'Store node 110 ko'; exit 1; } on success { echo 'Store node 110 OK'; } store path (server = 101, client = 110, conninfo='dbname=DATABASE host=192.168.0.101 port=5432 user=slony password=123'); store path (server = 110, client = 101, conninfo='dbname=DATABASE host=192.168.0.110 port=5432 user=slony password=123'); #-- # Slony-I organizes tables into sets. The smallest unit a node can # subscribe is a set. The master or origin of the set is node 110. #-- try{ create set (id=1, origin=110, comment='Tables in DATABASE'); # Tables set add table (set id=1, origin=101, id=67, fully qualified name = 'local110.test_tata2', comment='table local110.test_tata2'); # Sequences } on error { echo 'Creation Set : ko'; exit 1; } on success { echo 'Creation Set : OK'; } exit 0; _EOF_ if [ $? -ne 0 ]; then error CREATE SET 3 exit 1 fi ## EN: We start daemons. All on the same machine, which is # the one on which that script is executed. /usr/pgsql-9.3/bin/slon -f /home/slony1-ctl/slony-ctl/etc/slon.cfg repli_local110 dbname=DATABASE host=192.168.0.101 port=5432 user=slony password=123 /home/slony1-ctl/slony-ctl/logs/repli_local110_DATABASE_101.log 21 echo daemon slon pour DATABASE - 192.168.0.101 demarré sur national101 ssh/usr/pgsql-9.3/bin/slon -f /home/slony1-ctl/slony-ctl/etc/slon.cfg repli_local110 dbname=DATABASE host=192.168.0.110 port=5432 user=slony password=123 /home/slony/logs/repli_local110_DATABASE_110.log 21 echo daemon slon pour DATABASE - 192.168.0.110 demarré sur national101 ## EN: And now, subscription echo Souscription Set 1 par 101 /usr/pgsql-9.3/bin/slonik _EOF_ # # preamble file # INCLUDE /home/slony1-ctl/slony-ctl/etc/repli_local110.preamble # # Node 101 subscribes set 1 # try{ subscribe set ( id = 1, provider = 110, receiver = 101, forward = yes); } on error { echo 'Souscription ko'; exit 1; } on success { echo 'Souscription OK'; } _EOF_ if [ $? -ne 0 ]; then error SUBSCRIBE SET 3 exit 1 fi exit 0 -Message d'origine- De : Steve Singer [mailto:ssin...@ca.afilias.info] Envoyé : mercredi 16 octobre 2013 15:29 À : Sebastien Marchand Cc : slony1-general@lists.slony.info Objet : Re: [Slony1-general] Problem slony 2.2 and pg 9.3 double replication On 10/16/2013 09:17 AM, Sebastien Marchand wrote: Hi, I want do this : SERVER 1 SERVER 2 SCHEMA A ---SCHEMA A SCHEMA B ---SCHEMA B That works in slony 2.0.7/pg 9.0 But with slony 2.2 and postgresql 9.3 SERVER 1 SERVER 2 SCHEMA A ---SCHEMA A= OK SCHEMA B ---SCHEMA B= NOK I've got this error stdin:55: waiting for event (101,54) to be confirmed on node 110 Pg error : 2013-10-15 10:55:10.950 CEST ERROR: relation _sloncluster.sl_local_node_id does not exist at character 30 2013-10-15 10:55:10.950 CEST STATEMENT: select last_value::int4 from _sloncluster.sl_local_node_id
Re: [Slony1-general] Problem slony 2.2 and pg 9.3 double replication
On 10/16/13 13:33, Steve Singer wrote: On 10/16/2013 12:05 PM, Sebastien Marchand wrote: No slon running because script launch slon after create set. So i launch before and IT WORKS !!! thx master ! thanks you so much I don't know what it change in slony 2.0.7 and 2.2 but before i could do this But now it's ok :) What SHOULD happen with 2.2 when the slon isn't running is that slonik should keep printing: waiting for event (1,54) ... messages, that isn't an error that is an information message saying that it is waiting for the event to be propogated, and that won't happen unless slonik is running. What I don't understand is how you were getting 2013-10-15 10:55:10.950 CEST ERROR: relation _sloncluster.sl_local_node_id does not exist at character 30 type messages. That is not a slonik message, but is coming from slon. And sloncluster is the cluster name used in share/slon.conf-sample. Jan -Message d'origine- De : Steve Singer [mailto:ssin...@ca.afilias.info] Envoyé : mercredi 16 octobre 2013 17:56 À : Sebastien Marchand Objet : Re: [Slony1-general] Problem slony 2.2 and pg 9.3 double replication On 10/16/2013 11:52 AM, Sebastien Marchand wrote: Yes that's why i will delete all replication 2.0 and migrate postgresql after i rebuild new replication. when i've my error waiting for event (1,54) to be confirmed on node 110 in postgresql if i do : select * from _repli_local110.sl_path; = 1;110;dbname=RPP2 host=192.168.0.101 port=5432 user=slony password=123;10 Why just one line ?? i need two lines ! In my sl_event : 110;51;2013-10-16 17:50:54.697374+02;695958:695958:;STORE_PATH;1;110;dbname=RP P2 host=192.168.0.101 port=5432 user=slony password=123;10 I think there is a problem here... or not Slonik when it executes the STORE PATH , it will insert a row in sl_path on each node. When slon is running it will then replicate the STORE_PATH event from one node to the other to create the second sl_path row in each node. Is your slon running? -Message d'origine- De : Steve Singer [mailto:ssin...@ca.afilias.info] Envoyé : mercredi 16 octobre 2013 17:02 À : Sebastien Marchand Cc : slony1-general@lists.slony.info Objet : Re: [Slony1-general] Problem slony 2.2 and pg 9.3 double replication On 10/16/2013 10:57 AM, Sebastien Marchand wrote: I already tried to put in script but no change. Can you tell me where is the line of code that is generating this error ? it's a sql function ? I continue to search i need this to migrate. slon_mkquery(query, select max(ev_seqno) FROM \_%s\.sl_event , \_%s\.sl_node where ev_origin= (select last_value from \_%s\.sl_local_node_id) AND %s AND sl_node.no_id= ev_origin ,script-clustername, script-clustername, script-clustername, event_filter); In slonik.c slonik_get_last_event_id(SlonikStmt * stmt, SlonikScript * script, const char *event_filter, int64 **events) Also, if you are planning on migrating from 2.0 to 2.2 with UPDATE FUNCTIONS you should be aware of bug #320 -Message d'origine- De : Steve Singer [mailto:ssin...@ca.afilias.info] Envoyé : mercredi 16 octobre 2013 16:53 À : Sebastien Marchand Cc : slony1-general@lists.slony.info Objet : Re: [Slony1-general] Problem slony 2.2 and pg 9.3 double replication On 10/16/2013 10:48 AM, Steve Singer wrote: On 10/16/2013 10:06 AM, Sebastien Marchand wrote: CLUSTER NAME = repli_local110 ; NODE 101 ADMIN CONNINFO = 'dbname=RPP2 host=192.168.0.101 port=5432 user=slony password=password'; NODE 110 ADMIN CONNINFO = 'dbname=RPP2 host=192.168.0.110 port=5432 user=slony password=password'; It's the file repli_local110.preamble but for email i copy this file in my script :) Actually IGNORE what I said below, I hit send way too quickly before reviewing my output. I am NOT yet able to reproduce this, but try pasting your preamble inline and see if it works. I think I see the line of code that is generating the error you are seeing but I don't understand how/why it is substituting using _sloncluster instead of repli_local10; Sometimes details are important. Copy the preamble file inline into your script and it will work (like you did in your email) I can reproduce your issue with the preamble moved to a different file. This is a bug, it shouldn't be hard to fix, thanks for reporting it. I suspect the fix will be included in 2.2.1 due in the next few weeks (since this has been a busy week for 2.2.0 bugs - ie #318 and #320). ___
Re: [Slony1-general] Slony replication before db is up
On 08/31/13 05:51, Neha Sharma wrote: Hi, We are using master and slave postgres and slony for replcating the database. The master and slave are on different servers in the same LAN. The one containg master db was rebooted. When the machine came up, the slony started replication but the db state is still unavailable. The postgres logs shows the following errors: UTCFATAL: no pg_hba.conf entry for host 172.168.113.4, user localclient, database msw, SSL off where 172.168.113.4 is the IP address of the server which was rebooted. Could it be possible that before the database is up, slony starts replication. If not then why above error is being shown in postgres logs. Please note that slon logs are showing messages corresponding to copying of tables at the same time. That error message only indicates that some client, which may or may not be a slon process, is denied to connect because of the host based authentication failing. A Slony cluster always consists of multiple slon processes, each establishing multiple database connections. One of them copying or replicating data is no guarantee that every other one can also establish all the connections, they need. Is Slony actually using the user localclient for its connections? Jan -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin ___ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general
Re: [Slony1-general] init cluster when the other node is not reachable ?
On 07/30/13 10:18, Nicolas Frossard wrote: Hello dear Slony-1 users, I have a simple one master one slave scenario but an annoying requirement: the master and the slave should be able to be configured at different time, without the other node being necessarily reachable... For example, the following should be possible: 1- start the master DB (slave DB is not started yet) 2- init cluster/create set (only on the master) 3- start slon process on the master 4- add subscription of slave to the set (only on the master) ... some time later ... 5- start the slave DB 6- init cluster/create set (only on the slave) 7- start slon process on the slave 9- add subscription of slave to the set (only on the slave) ... now the replication is working ! unfortunately, it seems that it is impossible to correctly initialize/configure one node (the master) and then later the other (the slave). I have to do it at the same time, and both nodes need to be reachable. Is there way to solve this ? I'm using PostgreSQL 9.2 and Slony-1 2.1.3. The one thing that is impossible in your above description is the STORE NODE command for the second node (slave). For that one, both nodes must be reachable because slonik needs to connect to both databases at the same time. Then again, if the nodes can never reach each other, how are they going to replicate at all? Something in your problem doesn't make sense. Jan -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin ___ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general
Re: [Slony1-general] Slony Watchdog failed starting up the child process
On 07/23/13 15:21, Rose Nancy wrote: Hi Chris, On 13-07-23 03:07 PM, Christopher Browne wrote: My intuition from seeing it say FATAL is that that's indicating death of process, and that there's not much coming back from it. This behaviour is pretty consistent with what happens with a Postgres postmaster; if the attempt to start up fails due to seeming already to have a postmaster, it doesn't retry, pg_ctl immediately gives up. In my case the slons are running in a separate server which I rebooted. By the way, is this possibly because of a zombied old connection that got disconnected due to firewall glitch or such? If so, you should probably see about lowering the TCP keepalive parameters both in the slon.conf file and in postgresql.conf You're right, the duplicated key error was caused by a zombie old connection that got disconnected due to the slony server reboot. Which is something that TCP keepalives should clean out after a while. My recollection of the discussion, that led to the current behavior, is that we had several options. A) have it retry no matter what. In the case of a zombie that doesn't time out, that doesn't help, but at least it creates a constant stream of error messages. B) Treat the first connection attempt different, so that the system startup script or a later manual invocation will signal a problem. C) Kick the existing (assumed to be dead) slon out (by killing its backend). But if you really have two slons, that are alive and possibly running on different systems, they will keep stabbing each other in the back and eventually never get anything accomplished. We chose B) at that time. I am open for new ideas. Jan -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin ___ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general
Re: [Slony1-general] DML count in sl_apply_stats
Slony operates row based. It has no idea how many client side statements have created those rows. Or if any row was inserted, updated or deleted as a direct result of a client query, a stored procedure, a trigger or a cascaded foreign key operation. Jan -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin Raghav ragavendra@gmail.com wrote: Hi, insert into stest values(generate_series(1,100)); or insert into stest values (101),(102),(103); How does above two SQL statements treated and counted in sl_apply_stats.as_num_insert column ? I feel it should be considered as one INSERT statement, no ?, But it has multiplied the count even with generate_series. Test case: postgres=# select as_num_insert from _rep220.sl_apply_stats ; //before insert as_num_insert --- 0 (1 row) postgres=# insert into stest values (generate_series(1,100)); INSERT 0 100 postgres=# select as_num_insert from _rep220.sl_apply_stats ; as_num_insert --- 100 (1 row) postgres=# insert into stest values (101),(102),(103); INSERT 0 3 postgres=# select as_num_insert from _rep220.sl_apply_stats ; as_num_insert --- 103 (1 row) -- Regards Raghav Blog: htt://raghavt.blogspot.com/ ___ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general ___ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general
Re: [Slony1-general] Slony replication has stopped replicating
On 06/22/13 19:48, Scott Marlowe wrote: On Sat, Jun 22, 2013 at 4:45 PM, Jan Wieck janwi...@yahoo.com wrote: There is nothing laughable about your situation, so I don't know what you LOL all the time. It might be one of those laugh or cry things. Whoever installed that database and Slony system determined where the logs go. Check the system startup scripts ans syslog files/configuration for any hints. Unfortunately every Linux packager has his/her own mind where things are supposed to go. Not like in half organized FreeBSD territory, where you at least can guess where the logs may be. To be fair, most distros put it all in a few common places. I.e. Redhat / rpm based or Debian / deb based. And there's nothing stopping a BSD sysadmin from building postgresql ala ./configure --prefix=/home/mydir/are/you/kidding/me Fair enough. But then again, everyone can do that compiling from sources (unless you try to imply that Linux sysadmins don't know any more how to do that). The points several people have made remain. Bambi has to follow the system startup breadcrumbs to figure out where the log is, if he/she intends to find out what is currently broken in order to fix the current system On the other hand, fixing that system properly means to upgrade everything to the latest versions. Which should start with something like apt-get update or yum update instead of attempting to just get past the current problem. I still have the feeling that Slony stopped working because of a far greater underlying problem. Jan -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin ___ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general
Re: [Slony1-general] Slony replication has stopped replicating
On 06/22/13 14:48, Bambi Bellows wrote: Hi Gang! Slony replication has stopped replicating, and I'm not sure why, or how to get it replicating again. I have bounced Slony on the master and all the remote nodes, but no luck. Is there a Slony expert out there who can help point me in the right direction? Version info: slonik version 2.1.0.b2 psql (9.0.4) Slony 2.1 is on 2.1.3 and PostgreSQL 9.0 is on 9.0.13. Is there a particular reason why you keep running software that is known to have plenty of bugs? Jan -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin ___ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general
Re: [Slony1-general] Slony replication has stopped replicating
On 06/22/13 16:56, Bambi Bellows wrote: LOL. Thanks Jan. No, I inherited these systems, and upgrading is not really an option for the database... I am, however, trying to find where the log files are. They aren't in the /opt/logs/postgres directory, and they aren't in the postgres logs directory. I have two sudo finds running, but, so far, no luck. Gotta say, haven't been having the best of luck so far on anything slony-related. Slony related? I would say the whole thing you inherited is a complete mess and Slony is a casualty of that. There is nothing so far that I can see could be blamed on Slony. Only a grossly outdated and unmanaged server setup. Sorry to be so blunt, but I just don't tolerate it when Slony or PostgreSQL are blamed for server admin negligence and incompetence. This is apparently not your fault, but your predecessor's fault. So don't take it personal. Jan -Original Message- From: Jan Wieck [mailto:janwi...@yahoo.com] Sent: Saturday, June 22, 2013 3:53 PM To: Bambi Bellows Cc: slony1-general@lists.slony.info Subject: Re: [Slony1-general] Slony replication has stopped replicating On 06/22/13 14:48, Bambi Bellows wrote: Hi Gang! Slony replication has stopped replicating, and I'm not sure why, or how to get it replicating again. I have bounced Slony on the master and all the remote nodes, but no luck. Is there a Slony expert out there who can help point me in the right direction? Version info: slonik version 2.1.0.b2 psql (9.0.4) Slony 2.1 is on 2.1.3 and PostgreSQL 9.0 is on 9.0.13. Is there a particular reason why you keep running software that is known to have plenty of bugs? Jan -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin ___ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general
Re: [Slony1-general] Slony replication has stopped replicating
There is nothing laughable about your situation, so I don't know what you LOL all the time. Whoever installed that database and Slony system determined where the logs go. Check the system startup scripts ans syslog files/configuration for any hints. Unfortunately every Linux packager has his/her own mind where things are supposed to go. Not like in half organized FreeBSD territory, where you at least can guess where the logs may be. Jan -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin Bambi Bellows bbell...@dotomi.com wrote: LOL. Thank you for your advice. I will talk to my boss about this on Monday. -Original Message- From: Scott Marlowe [mailto:scott.marl...@gmail.com] Sent: Saturday, June 22, 2013 5:19 PM To: Bambi Bellows Cc: Jan Wieck; slony1-general@lists.slony.info Subject: Re: [Slony1-general] Slony replication has stopped replicating On Sat, Jun 22, 2013 at 3:16 PM, Bambi Bellows bbell...@dotomi.com wrote: LOL. You guys are killin me. Meantime, can someone point me to where the log file destination might be hidden? Because I can't even find out what broke and why. And as much as it's great to blame my predecessors, they aren't here anymore, and I'm out of my depth on Slony... ob.vi.ous.ly. And telling my boss to suck it is not really an option for me. I'm not telling you to tell them to suck it, I'm telling you to tell them you need to fix it right. Take the system offline for as long as it takes to upgrade postgresql (a few minutes typically). Drop the current slony, install the latest 2.1.x, and restart replication from scratch. Running around with your hair on fire at this point isn't likely to result in a long term sustainable fix, but a slowly repeating cycle of endless horror as the system gets fixed, breaks, and gets fixed over and over. This email and any files included with it may contain privileged, proprietary and/or confidential information that is for the sole use of the intended recipient(s). Any disclosure, copying, distribution, posting, or use of the information contained in or attached to this email is prohibited unless permitted by the sender. If you have received this email in error, please immediately notify the sender via return email, telephone, or fax and destroy this original transmission and its included files without reading or saving it in any manner. Thank you. ___ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general
Re: [Slony1-general] Slony cleanupEvent erroring out with server closed the connection unexpectedly - Soln version: 2.1.2
On 06/15/13 04:46, Sridevi R wrote: Jan, You are right. Tweaking the tcp keep alive parameters helped. Now slon.conf contains: tcp_keepalive=true tcp_keepalive_interval=45 tcp_keepalive_count=20 tcp_keepalive_idle=30 cleanup_interval=30 Thanks a lot for the timely response. You're welcome. Jan - Sridevi On Thu, Jun 13, 2013 at 11:46 PM, Jan Wieck janwi...@yahoo.com mailto:janwi...@yahoo.com wrote: On 06/13/13 06:25, Sridevi R wrote: Hello Jan, The Master and Slave DBs talk through a firewall. VIP IPs and SNAT IPs are used in pg_hba.conf. The corresponding messages in the postgres server log: 2013-06-13 09:46:21.224 GMT,,,6630,10.4.2.2:42031 http://10.4.2.2:42031 http://10.4.2.2:42031,51b994ed.19e6,1,,2013-06-13 09:46:21 GMT,,0,LOG,08P01,incomplete startup packet, 2013-06-13 09:57:38.596 GMT,postgres,db01,6634,ip address printed here:53924,51b994f7.19ea,1,idle,2013-06-13 09:46:31 GMT,28/0,0,LOG,08006,could not receive data from client: Connection reset by peer,slon.node_1_listen 2013-06-13 09:57:38.596 GMT,postgres,db01,6634,ip address printed here:53924,51b994f7.19ea,2,idle,2013-06-13 09:46:31 GMT,28/0,0,LOG,08P01,unexpected EOF on client connection,slon.node_1_listen 2013-06-13 09:57:38.607 GMT,postgres,db01,6637,ip address printed here:53926,51b994f9.19ed,1,idle,2013-06-13 09:46:33 GMT,32/0,0,LOG,08006,could not receive data from client: Connection reset by peer,slon.subscriber_1_provider_1 2013-06-13 09:57:38.607 GMT,postgres,db01,6637,ip address printed here:53926,51b994f9.19ed,2,idle,2013-06-13 09:46:33 GMT,32/0,0,LOG,08P01,unexpected EOF on client connection,slon.subscriber_1_provider_1 2013-06-13 09:57:38.608 GMT,postgres,db01,6635,ip address printed here:53925,51b994f7.19eb,1,idle,2013-06-13 09:46:31 GMT,31/0,0,LOG,08006,could not receive data from client: Connection reset by peer,slon.node_1_listen 2013-06-13 09:57:38.608 GMT,postgres,db01,6635,ip address printed here:53925,51b994f7.19eb,2,idle,2013-06-13 09:46:31 GMT,31/0,0,LOG,08P01,unexpected EOF on client connection,slon.node_1_listen The client slon log contains: 2013-06-13 09:57:38 GMT FATAL cleanupThread: begin;lock table _xx_cluster.sl_config_lock;select _xx_cluster.cleanupEvent('10 minutes'::interval);commit; - server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. This all can very well be a slightly too eager firewall dropping idle connections. Have you tried to enable TCP keep alive options that kick in after something like 30 seconds? If not, enable them on both, the PG server and the Slony side. That usually prevents those firewall issues. Jan Thanks, Sridevi On Thu, Jun 13, 2013 at 12:02 AM, Jan Wieck janwi...@yahoo.com mailto:janwi...@yahoo.com mailto:janwi...@yahoo.com mailto:janwi...@yahoo.com wrote: On 06/12/13 10:17, Sridevi R wrote: Jan, Thanks for the reply. The only errors in the slon log are failure of cleanupThread. child process is restarting right after the cleanupThread Failure. This occurs approximately every 10 minutes since cleanup_interval is set to 10 minutes. Here is a sample from the log again: 2013-06-06 14:23:27 GMT FATAL cleanupThread: begin;lock table _xx_cluster.sl_config_lock;select _xx_cluster.cleanupEvent('10 minutes'::interval);commit; - server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. 2013-06-06 14:23:27 GMT CONFIG slon: child terminated signal: 9; pid: 16135, current worker pid: 16135 2013-06-06 14:23:27 GMT CONFIG slon: restart of worker in 10 seconds server closed the connection unexpectedly ... Is this connection by any chance through some firewall or NAT gateway that will drop idle connections? What are the corresponding postmaster server log entries? Since slony reports an unexpected connection drop from the server, the server must have some message in its log too, because the client never sent the 'X' libpq protocol message. Jan Thanks , Sridevi On Wed, Jun 12
Re: [Slony1-general] Slony cleanupEvent erroring out with server closed the connection unexpectedly - Soln version: 2.1.2
On 06/13/13 06:25, Sridevi R wrote: Hello Jan, The Master and Slave DBs talk through a firewall. VIP IPs and SNAT IPs are used in pg_hba.conf. The corresponding messages in the postgres server log: 2013-06-13 09:46:21.224 GMT,,,6630,10.4.2.2:42031 http://10.4.2.2:42031,51b994ed.19e6,1,,2013-06-13 09:46:21 GMT,,0,LOG,08P01,incomplete startup packet, 2013-06-13 09:57:38.596 GMT,postgres,db01,6634,ip address printed here:53924,51b994f7.19ea,1,idle,2013-06-13 09:46:31 GMT,28/0,0,LOG,08006,could not receive data from client: Connection reset by peer,slon.node_1_listen 2013-06-13 09:57:38.596 GMT,postgres,db01,6634,ip address printed here:53924,51b994f7.19ea,2,idle,2013-06-13 09:46:31 GMT,28/0,0,LOG,08P01,unexpected EOF on client connection,slon.node_1_listen 2013-06-13 09:57:38.607 GMT,postgres,db01,6637,ip address printed here:53926,51b994f9.19ed,1,idle,2013-06-13 09:46:33 GMT,32/0,0,LOG,08006,could not receive data from client: Connection reset by peer,slon.subscriber_1_provider_1 2013-06-13 09:57:38.607 GMT,postgres,db01,6637,ip address printed here:53926,51b994f9.19ed,2,idle,2013-06-13 09:46:33 GMT,32/0,0,LOG,08P01,unexpected EOF on client connection,slon.subscriber_1_provider_1 2013-06-13 09:57:38.608 GMT,postgres,db01,6635,ip address printed here:53925,51b994f7.19eb,1,idle,2013-06-13 09:46:31 GMT,31/0,0,LOG,08006,could not receive data from client: Connection reset by peer,slon.node_1_listen 2013-06-13 09:57:38.608 GMT,postgres,db01,6635,ip address printed here:53925,51b994f7.19eb,2,idle,2013-06-13 09:46:31 GMT,31/0,0,LOG,08P01,unexpected EOF on client connection,slon.node_1_listen The client slon log contains: 2013-06-13 09:57:38 GMT FATAL cleanupThread: begin;lock table _xx_cluster.sl_config_lock;select _xx_cluster.cleanupEvent('10 minutes'::interval);commit; - server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. This all can very well be a slightly too eager firewall dropping idle connections. Have you tried to enable TCP keep alive options that kick in after something like 30 seconds? If not, enable them on both, the PG server and the Slony side. That usually prevents those firewall issues. Jan Thanks, Sridevi On Thu, Jun 13, 2013 at 12:02 AM, Jan Wieck janwi...@yahoo.com mailto:janwi...@yahoo.com wrote: On 06/12/13 10:17, Sridevi R wrote: Jan, Thanks for the reply. The only errors in the slon log are failure of cleanupThread. child process is restarting right after the cleanupThread Failure. This occurs approximately every 10 minutes since cleanup_interval is set to 10 minutes. Here is a sample from the log again: 2013-06-06 14:23:27 GMT FATAL cleanupThread: begin;lock table _xx_cluster.sl_config_lock;select _xx_cluster.cleanupEvent('10 minutes'::interval);commit; - server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. 2013-06-06 14:23:27 GMT CONFIG slon: child terminated signal: 9; pid: 16135, current worker pid: 16135 2013-06-06 14:23:27 GMT CONFIG slon: restart of worker in 10 seconds server closed the connection unexpectedly ... Is this connection by any chance through some firewall or NAT gateway that will drop idle connections? What are the corresponding postmaster server log entries? Since slony reports an unexpected connection drop from the server, the server must have some message in its log too, because the client never sent the 'X' libpq protocol message. Jan Thanks , Sridevi On Wed, Jun 12, 2013 at 7:33 PM, Jan Wieck janwi...@yahoo.com mailto:janwi...@yahoo.com mailto:janwi...@yahoo.com mailto:janwi...@yahoo.com wrote: On 06/12/13 07:14, Sridevi R wrote: Hello, The slony logs are consistently posting this error: 2013-06-12 10:01:05 GMT FATAL cleanupThread: begin;lock table _xx_cluster.sl_config_lock;select _xx_cluster.cleanupEvent('10 minutes'::interval);commit; - server closed the connection unexpectedly 2013-06-12 10:12:24 GMT FATAL cleanupThread: begin;lock table _xx_cluster.sl_config_lock;select _xx_cluster.cleanupEvent('10 minutes'::interval);commit; - server closed the connection unexpectedly checked and found that sl_confirm table is not cleaned up. cleanup event never succeeds. Additionally, the child processes terminates and restarts after each such cleanup failure. 2013-06-11 11:20:04 GMT
Re: [Slony1-general] Slony cleanupEvent erroring out with server closed the connection unexpectedly - Soln version: 2.1.2
On 06/12/13 10:17, Sridevi R wrote: Jan, Thanks for the reply. The only errors in the slon log are failure of cleanupThread. child process is restarting right after the cleanupThread Failure. This occurs approximately every 10 minutes since cleanup_interval is set to 10 minutes. Here is a sample from the log again: 2013-06-06 14:23:27 GMT FATAL cleanupThread: begin;lock table _xx_cluster.sl_config_lock;select _xx_cluster.cleanupEvent('10 minutes'::interval);commit; - server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. 2013-06-06 14:23:27 GMT CONFIG slon: child terminated signal: 9; pid: 16135, current worker pid: 16135 2013-06-06 14:23:27 GMT CONFIG slon: restart of worker in 10 seconds server closed the connection unexpectedly ... Is this connection by any chance through some firewall or NAT gateway that will drop idle connections? What are the corresponding postmaster server log entries? Since slony reports an unexpected connection drop from the server, the server must have some message in its log too, because the client never sent the 'X' libpq protocol message. Jan Thanks , Sridevi On Wed, Jun 12, 2013 at 7:33 PM, Jan Wieck janwi...@yahoo.com mailto:janwi...@yahoo.com wrote: On 06/12/13 07:14, Sridevi R wrote: Hello, The slony logs are consistently posting this error: 2013-06-12 10:01:05 GMT FATAL cleanupThread: begin;lock table _xx_cluster.sl_config_lock;select _xx_cluster.cleanupEvent('10 minutes'::interval);commit; - server closed the connection unexpectedly 2013-06-12 10:12:24 GMT FATAL cleanupThread: begin;lock table _xx_cluster.sl_config_lock;select _xx_cluster.cleanupEvent('10 minutes'::interval);commit; - server closed the connection unexpectedly checked and found that sl_confirm table is not cleaned up. cleanup event never succeeds. Additionally, the child processes terminates and restarts after each such cleanup failure. 2013-06-11 11:20:04 GMT CONFIG slon: child terminated signal: 9; pid: 20172, current worker pid: 20172 2013-06-11 11:20:04 GMT CONFIG slon: restart of worker in 10 seconds When cleanup is run manually, on the psql prompt it runs to completion without any issues and cleans up sl_event and sl_confirm tables begin;lock table _xx_cluster.sl_config_lock;select _xx_cluster.cleanupEvent('10 minutes'::interval);commit; Soln version: 2.1.2 Any help/insight would be greatly appreciated. Slon kills its worker(s) with signal 9 (SIGKILL) when it needs to restart, like when there are errors in event processing or if it receives certain signals. Are there any other errors in the slon log or is something on the machine sending signals to slon? Jan Thanks, Sridevi ___ Slony1-general mailing list Slony1-general@lists.slony.info mailto:Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin ___ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general
Re: [Slony1-general] Slony Versions and Postgres Versions and the compatibility.
On 06/06/13 04:16, Raghav wrote: __ Is there a Hard Rule that the Slony versions must be 100% exact at all times, or can a Target Node be at a newer subversion ahead of the Master and still run. AFAIK, Slony-I need exact version on both the node's. Because upgrading a node to newer version will do changes to slony_func.so file where it should match the version with the other replicating nodes else it will break. Below link has a comment after the upgrade steps as If there is any mismatch between component versions, the slon http://main.slony.info/documentation/2.1/slon.html will refuse to start up, which provides protection against corruption. That is correct. Slony requires the same Slony version on all nodes, but supports running on top of different PostgreSQL version. Jan http://main.slony.info/documentation/2.1/slonyupgrade.html For your other questions, hopefully you will see some answer's here. -- Regards Raghav Blog: htt://raghavt.blogspot.com/ http://raghavt.blogspot.com/ ___ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin ___ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general
Re: [Slony1-general] Truncate trigger in slony 2.1.x+ requires superuser?
On 05/15/2013 02:50 PM, Jan Wieck wrote: I would call it a bug. The truncate trigger should be security definer, like the log trigger and several other functions are. I have committed the 2-line fix for this to master and REL_2_1_STABLE. The diff for 2.1 is attached. Jan -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin diff --git a/src/backend/slony1_funcs.sql b/src/backend/slony1_funcs.sql index 7c083db..692d3ff 100644 --- a/src/backend/slony1_funcs.sql +++ b/src/backend/slony1_funcs.sql @@ -5540,7 +5540,8 @@ $$ end if; return NULL; end -$$ language plpgsql; +$$ language plpgsql +security definer; comment on function @NAMESPACE@.log_truncate () is 'trigger function run when a replicated table receives a TRUNCATE request'; ___ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general
Re: [Slony1-general] Truncate trigger in slony 2.1.x+ requires superuser?
On 05/15/2013 01:51 PM, Brian Fehrle wrote: Hi All, I have a slony cluster running on slony 2.1.0 (Have tested this situation on 2.1.3 also). This cluster is replicating a drupal database, which preforms truncates on some of the cache tables periodically. When the truncates execute, they fire the log_truncate trigger in slony, which immediatly errors due to no select permissions on the sl_table relation in the slony cluster. In normal events (inserts, updates, deletes), the user doesn't need any permissions granted in the slony schema in order for the event trigger to log the event into sl_log_1, etc. So is this a design decision, making truncates on slony tables only be able to be executed by superusers in the database? I would call it a bug. The truncate trigger should be security definer, like the log trigger and several other functions are. Jan I've got it working in a testing environment by granting the following: grant select on _slony.sl_table to user; grant insert on _slony.sl_log_status to user; grant insert on _slony.sl_log_1 to user; grant insert on _slony.sl_log_2 to user; grant usage on _slony.sl_action_seq to user; However I don't like the idea of either A. granting these permissions to slony tables or B. making my drupal user a 'superuser' to get around it. Are those my only two options here to allow truncates to work? Thanks, - Brian F ___ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin ___ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general
Re: [Slony1-general] pgsql replication with database schema changes using slonik execute script
On 05/09/13 20:31, Vick Khera wrote: I'm pretty sure slonik will just block until it is reachable. Try not to do that. This may be different in slony 2.0+, but I haven't tested. Unless there is an implicit or explicit WAIT FOR EVENT done my slonik, nothing should block. EXECUTE SCRIPT in all versions prior to 2.2 creates an event, containing the SQL. In 2.2 the SQL is injected into the data stream processed during SYNC application to make it work reliable with concurrent table access. Jan On Tue, May 7, 2013 at 6:54 AM, kapil bhadke kapilbha...@gmail.com mailto:kapilbha...@gmail.com wrote: I am using slony for replicating my pgsql database. I want to propagate the database schema changes(DDL) from master to the slave machine with the help of slonik execute script. I am curious to know what happens when I run slonik execute script(on master) and slave machine is not reachable. How will these schema changes get reflected on slave machine? ___ Slony1-general mailing list Slony1-general@lists.slony.info mailto:Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general ___ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin ___ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general
Re: [Slony1-general] Slony multiple connections with database
On 05/01/13 02:41, Neha Sharma wrote: Hi, We are using slony to synchronise two postgres databases of two machines. Generally slony creates 6 connections with the database - 4 with local db and 2 with remote db. But some problem is being observed on the machine having master db. Various slony remote connections are seen with the database. The limit of postgres connections is 100 and slony is consuming maximum connections due to which the db fails to respond after sometime. It was observed that connection reached to 104 out of which slony connections (which remote has created with db) were 74. We have to restart slony to clean all the connections. Ideally there should only be 2 connections from the remote - one created by listener and another by worker thread. There is no connectivity issue between the two machines. This sounds very much like lost TCP connections that never get the FIN or RST packts. Have you tried to enable keepalive with settings, that let connections time out within 5 minutes or faster in both, the PostgreSQL and the Slony configurations? Jan Could anyone please suggest what could be the reason for this weired behaviour of slony? Also, please tell if slony creates new connections without removing the older ones? -- */Neha Sharma/* ___ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin ___ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general
Re: [Slony1-general] session_replication_role 'replica' behavior and referential integrity constraints
On 4/24/2013 4:01 AM, Manos Tsahakis wrote: Hello all, In our application we are enabling session_replication_role TO 'replica' in certain situations so that triggers will not fire in a table during DML operations. However, we observed that when setting session_replication_role TO 'replica' referential integrity constraints will not fire on a table either. A simple example is given bellow: dynacom=# create table parent (id serial primary key, name text not null); dynacom=# create table child (id serial primary key, name text not null,pid int NOT NULL REFERENCES parent(id) ON DELETE CASCADE); dynacom=# insert into parent (name) values ('test 1'); INSERT 0 1 dynacom=# insert into parent (name) values ('test 2'); INSERT 0 1 dynacom=# insert into child (name,pid) values ('test kid2',2); INSERT 0 1 dynacom=# begin ; BEGIN dynacom=# set session_replication_role TO 'replica'; SET dynacom=# delete from parent where id=2; DELETE 1 dynacom=# commit ; COMMIT dynacom=# select * from child; id | name| pid +---+- 2 | test kid2 | 2 (1 row) dynacom=# select * from parent; id | name +-- (0 rows) So we are a left, basically, with an inconsistent database. 1. 9.2 documentation (http://www.postgresql.org/docs/9.2/static/sql-altertable.html) in theDISABLE/ENABLE [ REPLICA | ALWAYS ] TRIGGER section, makes a distinction between USER (non system-constraint related) and ALL triggers, but does not state that simply(??) enabled system (non-user) constraint triggers will not fire in case of session_replication_role = replica. Shouldn't non-user triggers *not* be affected by session_replication_role ? 2. Is there any way to just find the name of the FK constraint trigger and convert it to ENABLE ALWAYS? For the above test we used postgresql 9.2, currently we are running postgresql 9.0 in production. You want to set the session_replication_role to LOCAL. Eventually the docs should be a little clearer and eventually something is missing in the docs entirely. The missing piece is probably that the Slony-I triggers explicitly check the session_replication_role. The log trigger does nothing if the setting isn't ORIGIN. The deny-access trigger only errors out if the setting is ORIGIN. The poorly documented side effect of that is that if you set the role to LOCAL, all Slony triggers are effectively disabled while all user defined and integrity triggers will work like on a stand-alone database. Jan -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin ___ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general
Re: [Slony1-general] Help - replication has broken, best way to restart it without destroying data.
On 3/6/2013 7:08 AM, Jonathan Soong wrote: Hi guys Replication has died on my master-slave. What i would really like to do is stop replication and then get it to clear out the replicated tables in the Slave and start again (i.e truncate the replicated tables ONLY ). The problem I have is that only SOME tables are replicated between the Master and the Slave. - i do NOT want the replication to destroy the un-replicated tables on the Slave. I have seen all the instructions on how to start replication but most of them have the step of drop the Slave db and re-import the schema from the Master. I can't do this as I don't want to clobber my un-replicated tables. Is it possible to do what I want? Certainly. My initial thoughts were to do an slonik unsubscribe set and then do a slonik subscribe set The problem I have is that the slon daemon on the Slave is erroring (see below), and from what I understand the unsubscribe set event will just be put in the event queue. I.e. The Slave will never be able to unsubscribe because it is erroring on an earlier event. I was thinking there might be a way to delete all events in the queue so it could just continue ... is this possible? I realise that after that the data is not in sync, but im planning on blowing away the replicated tables and starting again anyway... Although there might be a way to fix replication, i kind of want to know how to just do a clean replication from this point, as I see it as an important task. Thanks for any thoughts/opinions. Cheers Jon Error in Slave log is: 2013-03-06 19:56:41 HKTINFO remoteWorkerThread_1: syncing set 1 with 47 table(s) from provider 1 2013-03-06 19:56:42 HKTERROR remoteWorkerThread_1: select _replication.sequenceSetValue(10,1,'5002597503','291048'); PGRES_FATAL_ERROR ERROR: Slony-I: sequenceSetValue(): sequence 10 not found You did not tell us which version of Slony you are running. That is always good to include in a problem report. In this case it is especially relevant since the above is probably the result from doing some DDL on the replica (slave) directly, without sending it through slonik's EXECUTE SCRIPT functionality. My bet is that someone dropped and recreated the sequence in question. You can find out what sequence that is by doing SELECT * FROM _replication.sl_sequence where seq_id = 10; If you are still running Slony 1.2 then it is important to know what other DDL may have been executed on the replica since doing that in 1.2 can lead to a permanently corrupted system catalog. There is no danger of that in 2.0 or 2.1. Operations harmful on replicated tables on a replica under 1.2 include things like create/drop index, add/drop constraint, create/drop trigger as well as dropping columns. If you conclude that your system catalog must still be consistent because you are running 2.0 or higher or none of the above has happened, then you can try to fix the problem first by running the slonik command REPAIR CONFIG against the replica node. http://slony.info/documentation/2.1/stmtrepairconfig.html If that does not fix the problem, then you can uninstall Slony using the UNINSTALL NODE command for all nodes. http://slony.info/documentation/1.2/stmtuninstallnode.html This does not touch any data on any of the nodes. It only restores things so that both systems at that point are stand alone databases. After that you simply rebuild your replication cluster like it was done originally. Creating nodes, paths, sets, tables and subscribing them. You do not need to truncate anything on the replica. Slony will take care of that. If you have to go through that it would be a good opportunity to upgrade to Slony 2.1 while at it. Regards, Jan -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin ___ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general
Re: [Slony1-general] Help - replication has broken, best way to restart it without destroying data.
On 3/6/2013 3:24 PM, Jonathan Soong wrote: Hi Jan Sorry for not including version, yup im 2.0.7 and pg 8.4.11. That's good, at least I know no data has been corrupted. Indeed, that is a good thing. As you say, I think DDL was run against the Slave by itself and i think it is corrupted. I don't think i can repair it because things were done against the Slave since then. Well, have you tried to run the REPAIR CONFIG slonik command for the set against the slave? The damage may not be as severe as you think. What is the actual name of that sequence 10 according to sl_sequence? Does that sequence exist in the slave database? I am presuming the UNINSTALL NODE will bypass the slony event queue (which is blocked up with an error) ? UNINSTALL NODE is a command that actually removes all traces of Slony from a database. There should not be a slon daemon running at that time and you are left with a standalone database, that has no idea what Slony is at all. Both databases would still have all your tables with whatever data is currently in them. So you would stop the slon daemons, run UNINSTALL NODE against both, master and slave. Then rebuild your replication from scratch, just assuming that you don't have to copy any schema. The tables already exist on the slave, so there is not need to do that. Jan -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin ___ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general
Re: [Slony1-general] Help - replication has broken, best way to restart it without destroying data.
On 3/6/2013 7:10 PM, Jonathan Soong wrote: Thanks Jan, all fixed. :) Most glad that it all turned out well. Just for the record, did the REPAIR CONFIG do the trick or did you have to rebuild the cluster? Jan Cheers Jon -Original Message- From: Jan Wieck [mailto:janwi...@yahoo.com] Sent: Thursday, 7 March 2013 7:11 AM To: Jonathan Soong Cc: slony1-general@lists.slony.info Subject: Re: [Slony1-general] Help - replication has broken, best way to restart it without destroying data. On 3/6/2013 3:24 PM, Jonathan Soong wrote: Hi Jan Sorry for not including version, yup im 2.0.7 and pg 8.4.11. That's good, at least I know no data has been corrupted. Indeed, that is a good thing. As you say, I think DDL was run against the Slave by itself and i think it is corrupted. I don't think i can repair it because things were done against the Slave since then. Well, have you tried to run the REPAIR CONFIG slonik command for the set against the slave? The damage may not be as severe as you think. What is the actual name of that sequence 10 according to sl_sequence? Does that sequence exist in the slave database? I am presuming the UNINSTALL NODE will bypass the slony event queue (which is blocked up with an error) ? UNINSTALL NODE is a command that actually removes all traces of Slony from a database. There should not be a slon daemon running at that time and you are left with a standalone database, that has no idea what Slony is at all. Both databases would still have all your tables with whatever data is currently in them. So you would stop the slon daemons, run UNINSTALL NODE against both, master and slave. Then rebuild your replication from scratch, just assuming that you don't have to copy any schema. The tables already exist on the slave, so there is not need to do that. Jan -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin ___ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general
Re: [Slony1-general] unsubscribe/drop set failing
On 12/20/2012 8:59 AM, Vick Khera wrote: On Thu, Dec 20, 2012 at 7:36 AM, Matthew Vernon matthew.ver...@sruc.ac.uk mailto:matthew.ver...@sruc.ac.uk wrote: How can I sort this out? easy way: uninstall slony and rebuild your cluster. hard way: no idea. i never succeeded in fixing a schema I broke this way. :( The ward way would be to DELETE all tables and sequences form sl_table and sl_sequence, who's taboid and seqoid don't appear in pg_class. Jan -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin ___ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general
Re: [Slony1-general] data copy for set 1 failed 3 times - sleep 60 seconds
On 12/10/2012 5:28 PM, Tory M Blue wrote: I'm back to it with debug of 4 on the source and the destination nodes. Still failing Destination: Slon log: 2012-12-10 12:04:49 PST CONFIG remoteWorkerThread_1: 6525.754 seconds to copy table cls.listings 2012-12-10 12:04:49 PST CONFIG remoteWorkerThread_1: copy table cls.customers 2012-12-10 12:04:49 PST CONFIG remoteWorkerThread_1: Begin COPY of table cls.customers 2012-12-10 12:04:49 PST ERROR remoteWorkerThread_1: select _admissioncls.copyFields(8); 2012-12-10 12:04:49 PST WARN remoteWorkerThread_1: data copy for set 1 failed 1 times - sleep 15 seconds 2012-12-10 12:04:51 PST INFO cleanupThread: 5961.364 seconds for cleanupEvent() 2012-12-10 12:05:06 PST INFO copy_set 1 - omit=f - bool=0 2012-12-10 12:05:06 PST INFO omit is FALSE 2012-12-10 12:05:06 PST CONFIG version for dbname=clsdb host=server user=postgres password=SECURED is 90104 2012-12-10 12:05:07 PST DEBUG1 copy_set_1 dbname=clsdb host=server user=postgres password=SECURED: backend pid = 17092 2012-12-10 12:05:07 PST CONFIG remoteWorkerThread_1: connected to provider DB Postgres logs: 2012-12-10 12:04:51 PST admissionclsdb postgres [local] NOTICE: Slony-I: Logswitch to sl_log_2 initiated 2012-12-10 12:04:51 PST admissionclsdb postgres [local] CONTEXT: SQL statement SELECT _admissioncls.logswitch_start() 2012-12-10 12:05:12 PST admissionclsdb postgres [local] LOG: sending cancel to blocking autovacuum PID 18620 2012-12-10 12:05:12 PST admissionclsdb postgres [local] DETAIL: Process 18299 waits for AccessExclusiveLock on relation 17410 of database 16385. 2012-12-10 12:05:12 PST admissionclsdb postgres [local] STATEMENT: lock table cls.listings; 2012-12-10 12:05:12 PSTERROR: canceling autovacuum task 2012-12-10 12:05:12 PSTCONTEXT: automatic vacuum of table admissionclsdb.cls.listings 2012-12-10 12:05:12 PST admissionclsdb postgres [local] NOTICE: truncate of cls.autobodystyle succeeded What is the output of select _admissioncls.copyFields(8); on node 1? It is important to do this on node 1 because that is where the remote worker, processing the copy_set, is doing this. Source: There is nothing in the slon log that would tell me it's aware of this client side restart or failure. Nothing in the postgres logs about an EOF or anything. Nothing in postgres log either that says there is an issue. Please specify which postgres log you are talking about. There are at least two in this case, the one on the new node and the one on the data provider (node 1). The one from the data provider is where to expect any error messages. Jan -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin ___ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general
Re: [Slony1-general] data copy for set 1 failed 3 times - sleep 60 seconds
On 12/10/2012 7:09 PM, Tory M Blue wrote: On Mon, Dec 10, 2012 at 3:59 PM, Jan Wieck janwi...@yahoo.com mailto:janwi...@yahoo.com wrote: On 12/10/2012 6:26 PM, Tory M Blue wrote: On Mon, Dec 10, 2012 at 2:55 PM, Jan Wieck janwi...@yahoo.com mailto:janwi...@yahoo.com mailto:janwi...@yahoo.com mailto:janwi...@yahoo.com wrote: On 12/10/2012 5:28 PM, Tory M Blue wrote: I'm back to it with debug of 4 on the source and the destination nodes. Still failing Destination: Slon log: 2012-12-10 12:04:49 PST CONFIG remoteWorkerThread_1: 6525.754 seconds to copy table cls.listings 2012-12-10 12:04:49 PST CONFIG remoteWorkerThread_1: copy table cls.customers 2012-12-10 12:04:49 PST CONFIG remoteWorkerThread_1: Begin COPY of table cls.customers 2012-12-10 12:04:49 PST ERROR remoteWorkerThread_1: select _admissioncls.copyFields(8); This is the bit that I don't understand. Looking at the remote_worker.c code, this error message comes from line 3032 in 2.1 STABLE. The same code is in line 2911 in HEAD. That select of copyFields() has apparently failed, because the result status code is not PGRES_TUPLES_OK, yet there seems to be nothing returned from PQresultErrorMessage(). I wonder what PQresStatus(PQresultStatus(__res3)) as well as PQerrorMessage(pro_dbconn) at this point have to tell. Thanks for not giving up on me Tory What version of Slony are you running this on? -bash-4.1$ slon -v slon version 2.1.1 Can you run a custom patched slon on the new subscriber so that we can see that information in the log? Absolutely! Attached is a patch for 2.1.1 that adds that information to the error message. Since slon remains the same version, you only have to replace the slon binary on the new subscriber (node 10). Jan -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin diff --git a/src/slon/remote_worker.c b/src/slon/remote_worker.c index 7fbf67c..3497008 100644 *** a/src/slon/remote_worker.c --- b/src/slon/remote_worker.c *** copy_set(SlonNode *node, SlonConn *local *** 3029,3037 if (PQresultStatus(res3) != PGRES_TUPLES_OK) { ! slon_log(SLON_ERROR, remoteWorkerThread_%d: \%s\ %s\n, node-no_id, dstring_data(query2), ! PQresultErrorMessage(res3)); PQclear(res3); PQclear(res1); slon_disconnectdb(pro_conn); --- 3029,3039 if (PQresultStatus(res3) != PGRES_TUPLES_OK) { ! slon_log(SLON_ERROR, remoteWorkerThread_%d: \%s\ %s %s - %s\n, node-no_id, dstring_data(query2), ! PQresStatus(PQresultStatus(res3)), ! PQresultErrorMessage(res3), ! PQerrorMessage(pro_dbconn)); PQclear(res3); PQclear(res1); slon_disconnectdb(pro_conn); ___ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general
Re: [Slony1-general] Multiple sets w/same table
On 12/6/2012 6:06 PM, Steve Singer wrote: On 12-12-04 11:50 PM, David Rees wrote: On Tue, Dec 4, 2012 at 7:34 PM, Steve Singer st...@ssinger.info wrote: For example, table 1 is in set 1 and table 2 is in set 2 and table 1 has a foreign key that refers to table 2. Changes that occur on the origin node 1 which will be the same for set 1/2 will obviously be fine, but is it possible for an event to be committed out-of-order on node 2? If you subscribe set 2 first, then set 1 and if both set 1 and your not using a different forwarder node, then I think you should be okay. The subscriptions should work either way since it is done under session_replication_role=replica, which suppresses foreign key checks. Jan -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin ___ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general
Re: [Slony1-general] Slony version 2.1.0 switchover
On 11/30/2012 11:04 AM, Steve Singer wrote: On 12-11-27 11:59 AM, Frank McGeough wrote: I'm writing to ask about using Slony switchover capability. I've never actually been able to make this work. I'm curious about whether this is a common experience or if, perhaps, there is something that I don't understand about how this should function. The environment that I have is a database that is in constant use. That is, there is always at least a low level of DML (inserts for the most part) occurring 24 x 7, 365 days a year. I've found that Slony will never be able to get a lock in order to perform the switchover and thus I'm left with tearing The slony MOVE SET operation needs to have an exclusive lock on all tables in the set at the same time. The recommended way of doing this is to have your application stop traffic to the database for long enough so that this can be done, and then wait for the event to propogate to the new master and then have your application start queries against the new master. If slony gets its locks on a move set in this order Table1 Table2 Table3 but your application access tables in this order Table3 Table2 Table1 Your application and slony will deadlock each other. What Steve said. Also, if your application is compatible with pgbouncer's transaction mode (i.e. not using server side prepared statements), then you can use that to PAUSE and RESUME the real database connections without even stopping the application. This can be completely scripted and pgbouncer can even do the redirection of traffic on RESUME. Jan -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin ___ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general
Re: [Slony1-general] data copy for set 1 failed 3 times - sleep 60 seconds
On 11/29/2012 4:39 PM, Tory M Blue wrote: Nothing, the postgres logs are clean, other than the checkpoint warnings, nothing, really absolutely nothing. This appears to be slony centric. Are these the Postgres logs on the new node or the data provider? Jan -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin ___ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general
Re: [Slony1-general] replication fails to due bad DNS at startup, requires slony restart to fix
On 11/13/2012 2:06 PM, Mark Stosberg wrote: On 11/12/2012 04:15 PM, Andrew Sullivan wrote: On Mon, Nov 12, 2012 at 04:05:37PM -0500, Christopher Browne wrote: Is it possible that your libc is doing cacheing? If *that's* the case, and I'm suspicious of it, then you really would need to restart the slon to get unstuck from what libc has cached on you... More likely, the system resolver is pointing at a caching name server. Thanks for prompt answers, everyone. I'll pursue this as an issue with our use of Ubuntu on Amazon/EC2. I don't think that restarting the slon manually is the right solution to this problem. Whatever really causes it (libc probably) doesn't matter. Changing DNS entries to redirect services to a different IP address is often used in disaster recovery procedures (failover). We should not require an outside process or worse an admin to restart things in a case like that. I don't know what exactly is the right answer. From behind libpq it is rather difficult to tell what exactly went wrong when trying to connect to a database. A generic answer to that would be to do only X attempts to (re)connect, then restart the entire slon process. But that has pitfalls. If someone has shut down a replica for maintenance purposes, all slons would constantly restart in that interval, which would prevent longer running tasks like subscriptions. So this would have to go hand in hand with some new configuration option(s), like ignore connection failures to this node WRT restarting. Some input from the user community would certainly help. Jan -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin ___ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general
Re: [Slony1-general] pg_dump, slony databases, and locking
On 10/24/2012 8:15 PM, Joe Conway wrote: On 10/24/2012 04:58 PM, Christopher Browne wrote: Arguably, the failure to restore the triggers isn't a real error, if you squint at things in a particular way. Alternatively, I suppose you could do a separate dump of *just* the Slony schema. That has *some* risk of locking conflict, but it should be an entirely smaller risk, as it's not encompassing the likely-large backup of your application's schema and data. Dump the Slony schema, and load it first, and that would allow the schema-excluding-Slony-bits to work. But my preference would be to squint at things sideways and say, It's OK that those triggers didn't load in. Thanks for the response (and Steve's as well). We are scripting the restore and therefore prefer to treat all errors as a failure. Unfortunately if we allow the restore trigger errors it will make it difficult to disallow other errors. But I guess it is something to consider. Not that this will be of any help for you, but I consider it a missing feature of pg_dump that it allows to omit a schema, but not to omit any references to that schema like triggers, foreign keys and so on. Jan -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin ___ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general
Re: [Slony1-general] Issue when adding node to replication
On 9/28/2012 2:40 PM, Brian Fehrle wrote: I'm going to go ahead and delete the offending row in sl_event for the node that doesn't exist. This is a production environment so I need to get up and running again. Deleting a SYNC event of a non-origin node can never have any negative side effects. They are noise that keeps sl_status looking good, but that's really all they do. SYNC events from a set origin are a little different. A SYNC immediately before an EXECUTE_SCRIPT event prior to 2.2 may be important. Other than that, they don't matter that much either. When a node has fallen behind, it deliberately skips a number of SYNC events, a feature we call sync-grouping, to catch up in larger leaps. There is no difference at all between slon grouping SYNC events and some SYNC event rows missing in sl_event. Absolutely zero difference. The difference for the SYNC event from a set origin in front of an EXECUTE_SCRIPT is because that DDL may add a column to a table. That SYNC event was supposed to make sure that the column exists on a subscriber when the first data containing that column appeared in sl_log_N. That all changed in 2.2 as well, where DDL from script execution now travels within sl_log_N too, so as of 2.2 you can simply delete all SYNC events from sl_event if you feel like. Slon will generate another SYNC soon and the processing of that will just leap the receivers to that state. Jan -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin ___ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general
Re: [Slony1-general] Issue when adding node to replication
On 9/28/2012 9:15 AM, Steve Singer wrote: * we can't ignore SYNC events even though the dropped * node is not an origin it might have been an old * origin before a FAILOVER. Some behind node still * might need to get caught up from its provider. That assessment is actually wrong. Any data provider as well as the origin will keep the data until all subscribers have confirmed it. Simply changing the data provider of the behind node will be enough to make it catch up from another provider. Jan -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin ___ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general
Re: [Slony1-general] Issue when adding node to replication
On 9/27/2012 2:34 PM, Brian Fehrle wrote: Hi all, PostgreSQL v 9.1.5 - 9.1.6 Slony version 2.1.0 I'm having an issue that's occurred twice now. I have 4 node slony cluster, and one of the operations is to drop a node from replication, do maintenance on it, then add it back to replication. Node 1 = master Node 2 = slave Node 3 = slave - dropped then readded Node 4 = slave First, why is the node actually dropped and readded so fast, instead of just doing the maintenance while it falls behind, then let it catch up? You apparently have a full blown path network from everyone to everyone. This is not good under normal circumstances since the automatic listen generation will cause every node to listen on every other node for events, from non-origins. Way too many useless database connections. What seems to happen here are some race conditions. The node is dropped and when it is added back again, some third node still didn't process the DROP NODE and when node 4 looks for events from node 3, it finds old ones somewhere else (like on 1 or 2). When node 3 then comes around to use those event IDs again, you get the dupkey error. What you could do if you really need to drop/readd it, use an explicit WAIT FOR EVENT for the DROP NODE to make sure all traces of that node are gone from the whole cluster. Jan -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin ___ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general
Re: [Slony1-general] Issue when adding node to replication
On 9/27/2012 3:58 PM, Brian Fehrle wrote: Follow up: I executed this on the master: mydatabase=# select * from _slony.sl_event where ev_origin not in (select no_id from _slony.sl_node); ev_origin | ev_seqno | ev_timestamp |ev_snapshot | ev_type | ev_data1 | ev_data2 | ev_data3 | ev_data4 | ev_data5 | ev_data6 | ev_data7 | ev_data8 ---++---++-+--+--+--+--+--+--+--+-- 3 | 5000290161 | 2012-09-27 09:48:03.749424-04 | 40580084:40580084: | SYNC| | | | | | | | (1 row) There is a row in sl_event that shouldn't be there, because it's referencing a node that nolonger exists. I need to add this node back to replication, but I don't want to run into the same issue as before. I ran a cleanupEvent('10 minute') and it did nothing (even did it with 0 minutes). Will this row eventually go away? will it cause issue if we attempt to add a new node to replication with node = 3? How can I safely clean this up? Hmmm, this actually looks like a more severe race condition or even a bug. The thing is that processing the DROP NODE and replicating the SYNC are different worker threads, since the events originate on different nodes. Cleaning out the sl_event is part of dropNode_int(). But the remoteWorker for 3 may just have inserted that SYNC concurrently and therefore it was left behind. My guess is that the right solution to this is to clean out everything again when a STORE NODE comes along. We had been thinking of making the node ID non-reusable to prevent this sort of race conditions. Jan -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin ___ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general
Re: [Slony1-general] Issue when adding node to replication
On 9/27/2012 5:30 PM, Christopher Browne wrote: On Thu, Sep 27, 2012 at 5:26 PM, Jan Wieck janwi...@yahoo.com wrote: My guess is that the right solution to this is to clean out everything again when a STORE NODE comes along. We had been thinking of making the node ID non-reusable to prevent this sort of race conditions. I'm not sure I'm totally comfortable with cleaning it all out instantly; as a step towards that, I'd think it a good idea for slonik to check all the nodes for existence of a node ID, and refuse if it's found anywhere. Under that circumstance, you might need to wait, to run the STORE NODE, until the cleanup thread has run on all the nodes to expunge the last bits of the node on all nodes' databases. Smells a bit safer to me... Check cleanupEvent(). I think it will never remove that stale event. Jan -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin ___ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general
Re: [Slony1-general] Issue when adding node to replication
On 9/27/2012 6:48 PM, Brian Fehrle wrote: On 09/27/2012 03:40 PM, Jan Wieck wrote: On 9/27/2012 5:30 PM, Christopher Browne wrote: On Thu, Sep 27, 2012 at 5:26 PM, Jan Wieck janwi...@yahoo.com wrote: My guess is that the right solution to this is to clean out everything again when a STORE NODE comes along. We had been thinking of making the node ID non-reusable to prevent this sort of race conditions. I'm not sure I'm totally comfortable with cleaning it all out instantly; as a step towards that, I'd think it a good idea for slonik to check all the nodes for existence of a node ID, and refuse if it's found anywhere. Under that circumstance, you might need to wait, to run the STORE NODE, until the cleanup thread has run on all the nodes to expunge the last bits of the node on all nodes' databases. Smells a bit safer to me... Check cleanupEvent(). I think it will never remove that stale event. Yeah, it looks like it will only remove confirmed ones. --code from cleanupEvent()- -- -- Then remove all events that are confirmed by all nodes in the -- whole cluster up to the last SYNC -- for v_min_row in select con_origin, min(con_seqno) as con_seqno from sl_confirm group by con_origin loop select coalesce(max(ev_seqno), 0) into v_max_sync from sl_event where ev_origin = v_min_row.con_origin and ev_seqno = v_min_row.con_seqno and ev_type = 'SYNC'; if v_max_sync 0 then delete from sl_event where ev_origin = v_min_row.con_origin and ev_seqno v_max_sync; end if; end loop; the query that hits sl_confirm for the loop returns the following: con_origin | con_seqn0 + 1 | 5000242178 2 | 5000661718 4 | 560743 So it never hits node 3 to do any delets from sl_event on node three. This is the only place in cleanupEvent i believe will do any deletes from sl_event. So should I try to delete this row myself, or would that cause major issues also? I'm still wrapping my head around how sl_confirm and sl_event work together when adding/removing nodes. Since there isn't even a node 3 in sl_node, it is safe to delete that row. Jan -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin ___ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general
Re: [Slony1-general] Manually kicking off a logswitch
On 9/19/2012 6:46 PM, Brian Fehrle wrote: Hi all, Postgres 8.4, slony 1.2.21 Previously I had reached out on an issue where the sl_log_1 or sl_log_2 table would get so full that replication would come to a crawl, only processing one event at a time. It seems as though HUGE data insert,updates,or deletes to replicated tables are to cause, and being on slony 1.2 there isn't much we can do to get around it. The size of the sl_log table was above 9 million rows where we saw this as an issue. We are now going along the path of doing much smaller groups of updates so we don't get into the same condition as before. We just did 1.2 million rows worth of updates and it only took a few minutes to replicate it all to the slave. Good news. But now our sl_log_1 table is sitting at 1.2 million rows, and we'd like to let it be switched and truncated by slony before kicking off a few more million rows worth of updates. From what I can tell via documentation, this is not all that often. So what is the thoughts on manually kicking off the logswitch via select _slony.logswitch_start() on the master? I reviewed the code and it won't let a switch occure if it's already in progress, so it seems it's being pretty safe in its execution. However it looks like all it really does is update a sequence to say we're currently switching and then slony does it in the background. The issue itself is caused by a problem with the log select query that was fixed in 2.1 (commit d4118d... from Jan 27, 2011). So my questions are. 1. is this a safe practice to do? We may be doing it multiple times a day (guestimate, ten or more times?). and 2. what is slony doing in the background for this to occur? It looks like it actually switches to the new log right away, but takes some time before the old log is truncated, does it need to wait until a cleanevent can run on the data within, aka about 10 minutes? (#2 is more out of curiosity). slon is calling the stored procedure cleanupEvent(interval). You can safely call that with an interval of a few minutes. The interval is how old events must be at least to be purged from sl_event. Even a zero interval should be safe. However, Slony is normally trying to do this every 10 minutes (if memory serves). With that backlog it is very likely that the previous logswitch attempt hasn't finished yet because the backlog is in the old log table. As said, the problem is fixed in 2.1. If upgrading to Slony 2.1 is not an option for you, you may consider backpatching the above commit into your 1.2. Jan -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin ___ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general
Re: [Slony1-general] Monitoring big table sync
On 8/21/2012 6:16 PM, Christopher Browne wrote: The one thing that I'd be a *bit* concerned about is the 57K event lag. At one point, I recall there being a possible troublesome case where one of the queries for the first SYNC processing after that could time out when trying to read in all of the tens of thousands of events. It would be nicer if, during this laggy period, SYNCs got generated somewhat less often, let's say, every 10-20s rather than every 0.5s. That can only happen on slon startup, when the remote_listen thread selects all NEW events. During the copy_set(), the remote listener is still getting these events and queues them in memory. Unless something is interrupting the copy, I'm not worried yet. Jan -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin ___ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general
Re: [Slony1-general] Monitoring big table sync
On 8/22/2012 11:45 AM, Christopher Browne wrote: On Wed, Aug 22, 2012 at 11:34 AM, Jan Wieck janwi...@yahoo.com wrote: On 8/21/2012 6:16 PM, Christopher Browne wrote: The one thing that I'd be a *bit* concerned about is the 57K event lag. At one point, I recall there being a possible troublesome case where one of the queries for the first SYNC processing after that could time out when trying to read in all of the tens of thousands of events. It would be nicer if, during this laggy period, SYNCs got generated somewhat less often, let's say, every 10-20s rather than every 0.5s. That can only happen on slon startup, when the remote_listen thread selects all NEW events. During the copy_set(), the remote listener is still getting these events and queues them in memory. Unless something is interrupting the copy, I'm not worried yet. Agreed. I do recall seeing trouble come up *afterwards*, if there were so many events to process that it took too long to run the query to draw them in. Ah, poking at the code is useful. There is a configuration option for this, remote_listen_timeout, which defaults to 300 seconds, but might be set higher. If, after the subscription completes, there are wildly too many SYNC events lingering around, and it takes too long to process them, the remote_listen thread will time out, complaining thus: You are missing the point. However many unprocessed SYNC events there will be after the copy, the remote_listener will have shoveled them into memory as they appeared already. There is no BIG EVENT SELECT after the copy. -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin ___ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general
Re: [Slony1-general] slonik_execute_script not working
On 7/4/2012 3:39 PM, Vick Khera wrote: On Mon, Jul 2, 2012 at 3:29 PM, Wolf Schwurack w...@uen.org mailto:w...@uen.org wrote: You can not run DDL statement using slony, it states that in slony docs Where? That's the only safe way to perform DDL on a replicated table. In 1.x yes. Since 2.0 DDL can be done directly with psql or by other means. If doing so, one needs to be careful which modifications are done to which nodes in which order so that inserts/updates aren't coming into a node with columns or things like that. But Slony by itself won't stand in the way any more. Jan -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin ___ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general
Re: [Slony1-general] How to perform updates on a table slave
On 6/19/2012 12:43 PM, Efraín Déctor wrote: Hello. We want to use Slony to keep our 2 servers replicated, however several tables are being accesed by Java processes that perform updates on them, Slony doesn’t this allow to happen. Is there a way to allow this?. Thank you. Several ways. 1) The proper way to do this would be to those fields, that need to be updated on the replica, into separate tables that are not replicated and local to the replica. 2) You could move the code, that modifies those rows, into a BEFORE trigger and configure it on the replica so that it fires during replication. 3) You could disable the deny_access triggers on those tables. Jan -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin ___ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general