Re: [Slony1-general] Slony website is down

2024-05-13 Thread Jan Wieck via Slony1-general

On 5/13/24 03:08, Devrim Gündüz via Slony1-general wrote:


Hi,

On Sun, 2024-05-12 at 18:09 +, Soni M via Slony1-general wrote:

I found that slony website is down.


https://www.slony.info/pipermail/slony1-general/2024-March/013559.html


It responds (sluggish) to me.

That said the Slony-I project is ending. We will not provide any new 
versions.



Regards, Jan



Regards,


___
Slony1-general mailing list
Slony1-general@lists.slony.info
https://lists.slony.info/cgi-bin/mailman/listinfo/slony1-general


___
Slony1-general mailing list
Slony1-general@lists.slony.info
https://lists.slony.info/cgi-bin/mailman/listinfo/slony1-general


Re: [Slony1-general] Replication interruption

2021-10-22 Thread Jan Wieck via Slony1-general

On 10/22/21 3:33 PM, Sung Hsin Lei via Slony1-general wrote:
This question may have been asked already. I have noticed that if it 
takes 10 days for the initial data transfer to complete, if the transfer 
is interrupted after 9 days, the transfer resumes from day 1. In other 
words, I need 10 days of uninterrupted connection between the master and 
slave machine for replication to be established properly. Is this the 
case or did I mess up somewhere?


This is correct. One "set" (of tables) must copy over in one transaction 
to have a consistent snapshot for replication to start catching up. In 
other systems this is often called "CDC" or "Change Data Capture". 
Without the initial copy being done in one transaction, the system has 
no way of knowing what incremental changes have to be replicated.


You may be able to break up the whole process into multiple steps by 
creating multiple sets. All the small (usually referenced PK tables) 
first in one set. Then create a set for one or few tables at a time and 
replicate them, after which you would do a MERGE SET. Rinse, repeat.






Another question. On a new replication(empty tables on the slave), I see 
the initial transfer stuck on TRUNCATE for a long time for big tables. 
Sometimes, it truncates the same table several times. I remember getting 
an explanation for why it is so. Is there a way to make the initial 
transfer faster(avoiding the truncate)?


A TRUNCATE taking a long time on an empty table sounds wrong. Are you 
sure this table is empty or do you assume it is logically empty because 
a previous COPY has failed but the system is actually making sure that 
none of its foreign key references are violated by the TRUNCATE operation?



--
Jan Wieck
___
Slony1-general mailing list
Slony1-general@lists.slony.info
http://lists.slony.info/mailman/listinfo/slony1-general


[Slony1-general] Test (01) moving

2020-02-23 Thread Jan Wieck via Slony1-general
Please ignore

-- 
Jan Wieck
Principal Database Engineer
___
Slony1-general mailing list
Slony1-general@lists.slony.info
http://lists.slony.info/mailman/listinfo/slony1-general


Re: [Slony1-general] Looks like someone's spam tool is buggy

2017-01-04 Thread Jan Wieck
I am right now seeing up to 6 bugs opened per minute. I suggest we shut
bugzilla down until we can figure this out.

On Wed, Jan 4, 2017 at 1:04 PM, Christopher Browne <cbbro...@gmail.com>
wrote:

> Oh dear.  I closed several hundred of them yesterday, with the faint
> hope that it was a one-off situation; apparently something is running
> on an ongoing basis, generating bugs consisting of "material of spammy
> consistency" on the Slony Bugzilla instance.
>
> Joshua Drake has been enquring as to whether or not we should see
> about updating some things (he's been thinking of the OS and
> PostgreSQL) on slony.info; I suspect we need to do something regarding
> the version of Bugzilla, with the dose of wishful thinking that
> perhaps a later version would be more resistant to this spamulous
> activity.
>
> I suggest weighing in on Joshua's comments; we should muse a bit on
> what to do about all this.
>



-- 
Jan Wieck
Senior Postgres Architect
http://pgblog.wi3ck.info
___
Slony1-general mailing list
Slony1-general@lists.slony.info
http://lists.slony.info/mailman/listinfo/slony1-general


[Slony1-general] Looks like someone's spam tool is buggy

2017-01-04 Thread Jan Wieck
Not sure how we can help to fix it though.


Jan

-- 
Jan Wieck
Senior Postgres Architect
http://pgblog.wi3ck.info
___
Slony1-general mailing list
Slony1-general@lists.slony.info
http://lists.slony.info/mailman/listinfo/slony1-general


Re: [Slony1-general] Slony Tuning, heavy slon backup every AM

2016-08-09 Thread Jan Wieck
Getting a 404 on those images.

On Tue, Aug 9, 2016 at 12:36 PM, Tory M Blue <tmb...@gmail.com> wrote:

>
>
> On Sat, Aug 6, 2016 at 6:16 AM, Jan Wieck <j...@wi3ck.info> wrote:
>
>> How are you monitoring the number of rows in the sl_log_* tables?
>>
>>
>> Jan
>>
>>
> I've got a couple of updates but first let me answer the question.
>
> max => "SHOW max_connections",
>
> cur => "SELECT COUNT(*) FROM pg_stat_activity",
>
> log1=> "SELECT COUNT(*) FROM _cls.sl_log_1",
>
> log2=> "SELECT COUNT(*) FROM _cls.sl_log_2",
>
> siz1=> "SELECT (relpages*8) FROM pg_class where relname='sl_log
> _1'",
> siz2=> "SELECT (relpages*8) FROM pg_class where relname='sl_log
> _2'"
>
> We have a script that runs and monitors a few things in the dB, one is the
> count of sl_log_1 and 2. This is added as an RRD
> and graphed.
>
> Now the update, graph above , is what we have been seeing, logs grow  up
> until 10:50am or so when the truncate is allowed (now backups are complete
> at 4am and all heavy lifting is finished at 5am)
>
>
> [image: idb01.gc - slonyTables]
>
>
> I disabled the full backup on the standby unit last night.
> [image: idb01.gc - slonyTables]
>
>
> I still get some peeks and valleys, but the system is not backed up for 10
> hours. So this tells me that the backup on the standby is causing a
> situation where slon is blocked because tables are locked on the standby
> unit. This is still not ideal but it's a big improvement from before where
> the sl_log_? would grow to 14million rows from about 2am to 10:45am..
>
> So I need to figure out how to reduce the overhead of the backup, I
> figured it was better to run it on the standby but at this point that is
> not looking like a great option..
>
> Thanks for working on this with me!
>
> Tory
>



-- 
Jan Wieck
Senior Postgres Architect
http://pgblog.wi3ck.info
___
Slony1-general mailing list
Slony1-general@lists.slony.info
http://lists.slony.info/mailman/listinfo/slony1-general


Re: [Slony1-general] Slony Tuning, heavy slon backup every AM

2016-08-06 Thread Jan Wieck
How are you monitoring the number of rows in the sl_log_* tables?


Jan

On Thu, Aug 4, 2016 at 3:04 PM, Tory M Blue <tmb...@gmail.com> wrote:

>
>
> On Thu, Aug 4, 2016 at 12:02 PM, Tory M Blue <tmb...@gmail.com> wrote:
>
>>
>>
>> On Thu, Aug 4, 2016 at 10:29 AM, Tory M Blue <tmb...@gmail.com> wrote:
>> > On Thu, Aug 4, 2016 at 8:39 AM, Jan Wieck <j...@wi3ck.info> wrote:
>> >> Another problem we recently ran into is that after bulk operations, the
>> >> queries selecting the log data on the data provider can degrade. A
>> >> workaround
>> >> in one case was to ALTER the slony user and set work_mem to 512MB. The
>> >> work_mem of 100MB, used by the application, was causing the scans on
>> >> the sl_log table to become sequential scans.
>> >>
>> >> The problem is amplified if the tables are spread across many sets.
>> Merging
>> >> the sets into few larger ones causes fewer scans.
>> >>
>> >>
>> >> Regards, Jan
>> >>
>> >
>> > Unfortunately, I've never been able to migrate to a slony user, so
>> > slony runs as postgres and my setting are
>> >
>> > work_mem = 2GB
>> >
>> > maintenance_work_mem = 2GB
>> >
>> > So don't think I can test this theory.
>> >
>> > Also of note, we have 3 sets
>> >
>> > Same today, just don't see any long running queries where slony has no
>> > time to truncate the table, so I'm not clear what is preventing the
>> > slony table from being truncated. ( it really sounds like something
>> > that Jan has run into before, just a huge table and for some reason
>> > the truncate can't complete before another job or request comes in?
>> > Even now the slon table is over 12Million and I expect it to be able
>> > to truncate/clear in the next 30 minutes or so, but why.. the big bulk
>> > operation finishes at 5am, not sure why it takes another almost 6
>> > hours to truncate that table..
>> >
>> > Thanks again!
>> > Tory
>>
>> Also interesting today, is that it's getting worse :) but the same
>> pattern, other then it's taking longer and today I saw something really
>> weird.
>>
>> slon ran a truncate but only cleared 50% of the sl_log1, How is that
>> possible, I mean a truncate is a truncate, I have no idea how this table
>> went from 14M to 7M
>>
>> 2016-08-04 10:43:03 PDT clsdb postgres 10.13.200.231(37864) 58874
>> 2016-08-04 10:43:03.237 PDTNOTICE:  Slony-I: could not lock sl_log_2 -
>> sl_log_2 not truncated
>>
>> 2016-08-04 10:53:43 PDT clsdb postgres 10.13.200.231(37864) 58874
>> 2016-08-04 10:53:43.573 PDTNOTICE:  Slony-I: log switch to sl_log_1
>> complete - truncate sl_log_2
>>
>> 2016-08-04 11:05:41 PDT clsdb postgres 10.13.200.231(37864) 58874
>> 2016-08-04 11:05:41.750 PDTNOTICE:  Slony-I: Logswitch to sl_log_2 initiated
>>
>> *2016-08-04 11:16:54 PDT clsdb postgres 10.13.200.231(37864) 58874
>> 2016-08-04 11:16:54.783 PDTNOTICE:  Slony-I: log switch to sl_log_2
>> complete - truncate sl_log_1   <--- didn't actually "truncate" the table,
>> there is still 7.5M rows and yet it's moving ahead with logswitch to
>> sl_log_1 (which again was just supposedly truncated, had 14 million rows
>> but has 7.5 Million rows still)..*
>>
>> 2016-08-04 11:29:06 PDT clsdb postgres 10.13.200.231(37864) 58874
>> 2016-08-04 11:29:06.332 PDTNOTICE:  Slony-I: Logswitch to sl_log_1 initiated
>>
>> 2016-08-04 11:40:43 PDT clsdb postgres 10.13.200.231(37864) 58874
>> 2016-08-04 11:40:43.048 PDTNOTICE:  Slony-I: log switch to sl_log_1
>> complete - truncate sl_log_2
>>
>> 2016-08-04 11:51:44 PDT clsdb postgres 10.13.200.231(37864) 58874
>> 2016-08-04 11:51:44.660 PDTNOTICE:  Slony-I: Logswitch to sl_log_2 initiated
>>
>> So what I'm also noticing and maybe not related, but my standby node has
>> lots of locks and long running reports, I'm wondering if it's causing some
>> of the backup seen on the master.  So many questions still :))
>>
>> Thanks for listening and being my wall to bounce stuff off of
>>
>> Tory
>>
>> Sent to soon
>
> 2016-08-04 12:02:50 PDT clsdb postgres 10.13.200.231(37864) 58874
> 2016-08-04 12:02:50.556 PDTNOTICE:  Slony-I: log switch to sl_log_2
> complete - truncate sl_log_1
> *(This truncate actually removed the 7.5 Million remaining rows), *
>
> *I'm confused how can a truncate leave rows?!*
>
>
> Tory
>
> ___
> Slony1-general mailing list
> Slony1-general@lists.slony.info
> http://lists.slony.info/mailman/listinfo/slony1-general
>
>


-- 
Jan Wieck
Senior Postgres Architect
http://pgblog.wi3ck.info
___
Slony1-general mailing list
Slony1-general@lists.slony.info
http://lists.slony.info/mailman/listinfo/slony1-general


Re: [Slony1-general] Slony Tuning, heavy slon backup every AM

2016-08-04 Thread Jan Wieck
Another problem we recently ran into is that after bulk operations, the
queries selecting the log data on the data provider can degrade. A
workaround
in one case was to ALTER the slony user and set work_mem to 512MB. The
work_mem of 100MB, used by the application, was causing the scans on
the sl_log table to become sequential scans.

The problem is amplified if the tables are spread across many sets. Merging
the sets into few larger ones causes fewer scans.


Regards, Jan


On Thu, Aug 4, 2016 at 7:52 AM, Glyn Astill <glynast...@yahoo.co.uk> wrote:

>
> > From: Tory M Blue <tmb...@gmail.com>
> >To: Jan Wieck <j...@wi3ck.info>
> >Cc: slony <slony1-general@lists.slony.info>
> >Sent: Wednesday, 3 August 2016, 21:45
> >Subject: Re: [Slony1-general] Slony Tuning, heavy slon backup every AM
> >
> >
> >
> >
> >
> >
> >
> >On Wed, Aug 3, 2016 at 12:30 PM, Jan Wieck <j...@wi3ck.info> wrote:
> >
> >
> >>
> >>
> >>
> >>On Wed, Aug 3, 2016 at 11:14 AM, Tory M Blue <tmb...@gmail.com> wrote:
> >>
> >>Hey folks
> >>>
> >>>Running into some more issues and i'm not finding a definitive tuning
> guide nor can I tell why i'm backing up and why it takes so long to catch
> up (so long is relative).
> >>>
> >>>
> >>>Running Slony 2.2.3 (but this has been an issue for a long time).
> >>>
> >>>
> >>>Image to show you where I am this AM , but this is a constant every
> AM.  I have large updates about 2.5 million between 12am and 4am, the job
> is done but it will take Slon a couple of hours to catch up.
> >>
> >>
> >>Are there long running transactions that start around the time, the
> sl_log starts
> >>growing? Any long running transaction prevents a log switch from
> finishing.
> >>
> >>
> >>
> >>
> >>Jan
> >
> >
> >Hi Jan,
> >
> >
> >There are really not many long queries during the big import, but things
> start getting longer when the slon logs have 10m+ rows in them.
> >
> >
> >I actually log any transaction that takes over a second and there is
> nothing that stands out, nothing that runs all that long. During this
> period, I move my reporting to point to the master, because of the delay in
> the standby, so there are some longer report queries, but the longest is 16
> minutes
> >
>
>
> Do you see any long running copy statements from the slon on the
> subscriber?  If so one avenue to investigate is if there are any replicated
> tables with sub-optimal indexes on them.
>
> I mention it purely because I've been bitten by this myself and seen very
> similar symptoms to what you're reporting here.  There's every chance this
> is way off the mark, but I think it's worth mentioning.
>
> The issue I saw was down to a bunch of pretty ugly indexes created by an
> application, and the result was inaccurate selectivity estimates causing
> the planner to choose one of the ugly indexes instead of the primary key
> when applying the changes on the subscriber.  Multivariate planner
> statistics that PostgreSQL doesn't yet have might have mitigated this, but
> that's besides the point.
>
>
> For example I'd see something like 200,000 records getting updated by a
> single update statement on the origin taking about 20 seconds, but when
> applied as 200,000 separate update statements on the subscriber it took
> more like 200 minutes.  An update using an index scan on the ugly index
> would take 62ms vs 0.1ms on the primary key, but outwardly all I saw until
> I enabled autoexplain was a long running copy.
>
> Glyn
>



-- 
Jan Wieck
Senior Postgres Architect
http://pgblog.wi3ck.info
___
Slony1-general mailing list
Slony1-general@lists.slony.info
http://lists.slony.info/mailman/listinfo/slony1-general


Re: [Slony1-general] Slony Tuning, heavy slon backup every AM

2016-08-03 Thread Jan Wieck
On Wed, Aug 3, 2016 at 11:14 AM, Tory M Blue <tmb...@gmail.com> wrote:

> Hey folks
>
> Running into some more issues and i'm not finding a definitive tuning
> guide nor can I tell why i'm backing up and why it takes so long to catch
> up (so long is relative).
>
> Running Slony 2.2.3 (but this has been an issue for a long time).
>
> Image to show you where I am this AM , but this is a constant every AM.  I
> have large updates about 2.5 million between 12am and 4am, the job is done
> but it will take Slon a couple of hours to catch up.
>

Are there long running transactions that start around the time, the sl_log
starts
growing? Any long running transaction prevents a log switch from finishing.


Jan



>
> Hardware: big 256GB 32 Proc machines (proc's don't matter here). I see
> very little iowait 2% on the standby, 0% on the primary and I'm seeing a
> whopping 150tps coming to the secondary (I would expect it to be writing
> like crazy!)
>
> idb01 is the primary idb02 is the secondary
>
>
>
> MY configuration settings, or those that I think matter
>
> cleanup_interval="5 minutes"
>
> sync_interval=1000
>
> sync_group_maxsize=5000
>
> #sync_max_rowsize=8192
>
>
> I don't see why my primary backs up so much, I don't see the same backup
> on idb02 (which is replicating to 3 other servers. Those 3 servers  have
> forward=no, so they never have anything stored in sl_log. (meaning the
> graphs for those are flat lined).
>
> I think the only thing to really tune is sync_group_maxsize, or
> sync_max_rowsize maybe. I have lots of RAM, but would really like to figure
> out why I seem to backup, the hardware, network, disk is good, Slony should
> really be performing better, it has lots of resources..
>
>
> Thanks for any ideas or where to look. Logs don't seem to tell me there is
> an issue.
>
> -Tory
>
>
>
> _______
> Slony1-general mailing list
> Slony1-general@lists.slony.info
> http://lists.slony.info/mailman/listinfo/slony1-general
>
>


-- 
Jan Wieck
Senior Postgres Architect
http://pgblog.wi3ck.info
___
Slony1-general mailing list
Slony1-general@lists.slony.info
http://lists.slony.info/mailman/listinfo/slony1-general


Re: [Slony1-general] Replication Lag?

2016-02-29 Thread Jan Wieck
On 02/29/2016 10:49 AM, Rob Brucks wrote:
> 1. Yes, the sl_confirm data is showing up on the subscriber.
>
> 2. No, the origin node is not getting back the sl_confirm data from the 
> active subscriber.

Does the origin node log any errors that it cannot connect to that 
subscriber node?


Jan



>
>
> Thanks,
> Rob
>
>
>
> On 2/26/16, 8:38 PM, "Steve Singer" <st...@ssinger.info> wrote:
>
>>On Fri, 26 Feb 2016, Rob Brucks wrote:
>>
>>>
>>> But, if I insert some test data into the master DB, I see the data show up
>>> on the remaining active slave.  So replication to the remaining slave DB
>>> is obviously working.
>>
>>Replication with slony has two parts
>>
>>1. Does the data replicate from the origin to the subscriber. When this
>>happens a row is added to the subscriber's sl_event table with
>>ev_origin=$origin_node and a confirm is added to the subscribers sl_confirm
>>table with con_origin=$origin_node and con_received=$subscriber_node
>>
>>2. The sl_confirm row mentioned above needs to then get picked up by the
>>slon for the origin node and brought back from the subscriber to the origin.
>>
>>Are your confirms making it back?
>>
>>
>>>
>>> We use sl_status to monitor replication so we need it to accurately report 
>>> lag if there's an issue.  The Slony 1.2 version we used before did not 
>>> behave this way, it accurately reported which slave was not replicating.
>>>
>>> Why does sl_status report lag on the active slave even though replication 
>>> appears to be working fine?
>>>
>>> Do I have a misconfiguration somewhere?
>>>
>>> Thanks,
>>> Rob
>>>
>>>
>>> Here's my slony config:
>>>
>>>
>>>  CLUSTER NAME = slony;
>>>  NODE 1 ADMIN CONNINFO = 'dbname=test_db host=/tmp port=5432 
>>> user=slony';
>>>  NODE 2 ADMIN CONNINFO = 'dbname=test_db host=/tmp port=5433 
>>> user=slony';
>>>  NODE 3 ADMIN CONNINFO = 'dbname=test_db host=/tmp port=5434 
>>> user=slony';
>>>
>>>  CLUSTERS
>>>
>>>  INIT CLUSTER (ID = 1, COMMENT = 'Master');
>>>
>>>
>>>  NODES
>>>
>>>  STORE NODE (ID = 2, COMMENT = 'Slave1', EVENT NODE = 1);
>>>  STORE NODE (ID = 3, COMMENT = 'Slave2', EVENT NODE = 1);
>>>
>>>
>>>  PATHS
>>>
>>>  STORE PATH (SERVER = 1, CLIENT = 2, CONNINFO = 'dbname=test_db 
>>> host=/tmp port=5432 user=slony');
>>>  STORE PATH (SERVER = 1, CLIENT = 3, CONNINFO = 'dbname=test_db 
>>> host=/tmp port=5432 user=slony');
>>>  STORE PATH (SERVER = 2, CLIENT = 1, CONNINFO = 'dbname=test_db 
>>> host=/tmp port=5433 user=slony');
>>>  STORE PATH (SERVER = 2, CLIENT = 3, CONNINFO = 'dbname=test_db 
>>> host=/tmp port=5433 user=slony');
>>>  STORE PATH (SERVER = 3, CLIENT = 1, CONNINFO = 'dbname=test_db 
>>> host=/tmp port=5434 user=slony');
>>>  STORE PATH (SERVER = 3, CLIENT = 2, CONNINFO = 'dbname=test_db 
>>> host=/tmp port=5434 user=slony');
>>>
>>>
>>>  SETS
>>>
>>>  CREATE SET (ID = 1, ORIGIN = 1, COMMENT = 'TEST Set 1');
>>>
>>>  SEQUENCES
>>>
>>>  SET ADD SEQUENCE (SET ID = 1, ORIGIN = 1, ID = 1, FULLY QUALIFIED NAME 
>>> = '"public"."test_seq"');
>>>
>>>  TABLES
>>>
>>>  SET ADD TABLE (SET ID = 1, ORIGIN = 1, ID = 2, FULLY QUALIFIED NAME = 
>>> '"public"."test"');
>>>
>>>  SUBSCRIPTIONS
>>>
>>>  SUBSCRIBE SET (ID = 1, PROVIDER = 1, RECEIVER = 2, FORWARD = YES);
>>>  SUBSCRIBE SET (ID = 1, PROVIDER = 1, RECEIVER = 3, FORWARD = YES);
>>>
>>>
>>> ___
>>> Slony1-general mailing list
>>> Slony1-general@lists.slony.info
>>> http://lists.slony.info/mailman/listinfo/slony1-general
>>>
>>
> ___
> Slony1-general mailing list
> Slony1-general@lists.slony.info
> http://lists.slony.info/mailman/listinfo/slony1-general
>


-- 
Jan Wieck
Database Architect
___
Slony1-general mailing list
Slony1-general@lists.slony.info
http://lists.slony.info/mailman/listinfo/slony1-general


Re: [Slony1-general] Slony-I: log switch to sl_log_2 still in progress - sl_log_1 not truncated

2016-02-02 Thread Jan Wieck
On 02/02/2016 08:06 AM, Tignor, Tom wrote:
> 
>   I did drop one of my replicas several weeks ago, though I didn’t 
> recreate
> the node. I do have automation to do exactly that, however, if a replica
> becomes defective somehow. Seems I’ll need to consider the point of the
> two log switches. Is that important even if the dropped node isn’t a
> provider for anybody?

Yes, it is important because even a non-forwarding leaf node is still
producing events that propagate to all other nodes.

You might want to check sl_even on all nodes if there are remnants of
dropped and re-created nodes there. This would be sl_event rows with an
ev_seqno in the future of what that node is currently producing.


Regards, Jan

> 
>   Tom:-)
> 
> 
> On 2/2/16, 12:06 AM, "Jan Wieck" <j...@wi3ck.info> wrote:
> 
>>On 02/01/2016 01:24 PM, Tignor, Tom wrote:
>>>
>>> Quick update: a couple hours after deleting entries from both sl_log
>>> tables with txids > 630M, it appears the cleanup thread has taken care
>>>of
>>> business. sl_log_1 is down from 54GB to 24KB.
>>
>>I still wonder how that happened in the first place.
>>
>>Was there a sequence of dropping and re-creating a node recently? I
>>think I have seen cases like this where if a node is re-created with the
>>same node ID before the cleanup of data, belonging to the dropped node,
>>had happened everywhere. The "cleanup" I am talking about is basically 2
>>complete log switches on all nodes after the DROP NODE had replicated
>>everywhere. That takes at least 20-30 minutes and can in some cases take
>>hours.
>>
>>
>>Regards, Jan
>>
>>
>>>
>>> Tom:-)
>>>
>>>
>>> On 2/1/16, 9:06 AM, "Tignor, Tom" <ttig...@akamai.com> wrote:
>>>
>>>>
>>>>Jan,
>>>>Thanks much for all the help. I’ve been looking over logswitch finish
>>>>and
>>>>the event and changelog tables. Selecting logswitch_finish() on one of
>>>>my
>>>>replicas simply returned ""Slony-I: log switch to sl_log_2 still in
>>>>progress - sl_log_1 not truncated” All three seem to be in that state
>>>>with
>>>>either sl_log_1 or sl_log_2. Looking closer at sl_event and sl_log_2 I’m
>>>>seeing some strangeness.
>>>>
>>>>ams=# select min(pg_catalog.txid_snapshot_xmin(ev_snapshot)),
>>>>max(pg_catalog.txid_snapshot_xmin(ev_snapshot)) from
>>>>_ams_cluster.sl_event
>>>>where ev_origin = 1;
>>>>min|max
>>>>---+---
>>>> 139136948 | 139204299
>>>>(1 row)
>>>>
>>>>
>>>>ams=# select min(log_txid), max(log_txid) from _ams_cluster.sl_log_2
>>>>where
>>>>log_origin = 1;
>>>>min|max
>>>>---+---
>>>> 631532717 | 631661386
>>>>(1 row)
>>>>
>>>>
>>>>ams=#
>>>>
>>>>
>>>>So I understand all the txids referenced in sl_event are in the 139M
>>>>range while all those in sl_log_2 are in the 631M range. Normally, the
>>>>sl_log txids should be older, shouldn’t they? Do you think I’ve hit a
>>>>txid-wraparound problem?
>>>>
>>>>Tom:-)
>>>>
>>>>
>>>>
>>>>On 1/28/16, 1:09 PM, "Jan Wieck" <j...@wi3ck.info> wrote:
>>>>
>>>>>On 01/28/2016 11:11 AM, Tignor, Tom wrote:
>>>>>>
>>>>>>  Output below. They seem to be replicating normally, except for the
>>>>>>sl_log
>>>>>> growth.
>>>>>
>>>>>Indeed. Is there anything in the slon logs for those nodes that says
>>>>>why
>>>>>it doesn't finish the log switch?
>>>>>
>>>>>Connect to the database as a the slony user.
>>>>>
>>>>>To check if a log switch is indeed in progress, do
>>>>>
>>>>> SELECT last_value FROM _ams_cluster.sl_log_status;
>>>>>
>>>>>It should be either 2 or 3. If it is 0 or 1, no log switch is in
>>>>>progress and you can start one with
>>>>>

Re: [Slony1-general] Slony-I: log switch to sl_log_2 still in progress - sl_log_1 not truncated

2016-02-01 Thread Jan Wieck
On 02/01/2016 01:24 PM, Tignor, Tom wrote:
> 
>   Quick update: a couple hours after deleting entries from both sl_log
> tables with txids > 630M, it appears the cleanup thread has taken care of
> business. sl_log_1 is down from 54GB to 24KB.

I still wonder how that happened in the first place.

Was there a sequence of dropping and re-creating a node recently? I
think I have seen cases like this where if a node is re-created with the
same node ID before the cleanup of data, belonging to the dropped node,
had happened everywhere. The "cleanup" I am talking about is basically 2
complete log switches on all nodes after the DROP NODE had replicated
everywhere. That takes at least 20-30 minutes and can in some cases take
hours.


Regards, Jan


> 
>   Tom:-)
> 
> 
> On 2/1/16, 9:06 AM, "Tignor, Tom" <ttig...@akamai.com> wrote:
> 
>>
>>  Jan,
>>  Thanks much for all the help. I’ve been looking over logswitch finish 
>> and
>>the event and changelog tables. Selecting logswitch_finish() on one of my
>>replicas simply returned ""Slony-I: log switch to sl_log_2 still in
>>progress - sl_log_1 not truncated” All three seem to be in that state with
>>either sl_log_1 or sl_log_2. Looking closer at sl_event and sl_log_2 I’m
>>seeing some strangeness.
>>
>>ams=# select min(pg_catalog.txid_snapshot_xmin(ev_snapshot)),
>>max(pg_catalog.txid_snapshot_xmin(ev_snapshot)) from _ams_cluster.sl_event
>>where ev_origin = 1;
>>min|max
>>---+---
>> 139136948 | 139204299
>>(1 row)
>>
>>
>>ams=# select min(log_txid), max(log_txid) from _ams_cluster.sl_log_2 where
>>log_origin = 1;
>>min|max
>>---+---
>> 631532717 | 631661386
>>(1 row)
>>
>>
>>ams=#
>>
>>
>>  So I understand all the txids referenced in sl_event are in the 139M
>>range while all those in sl_log_2 are in the 631M range. Normally, the
>>sl_log txids should be older, shouldn’t they? Do you think I’ve hit a
>>txid-wraparound problem?
>>
>>  Tom:-)
>>
>>
>>
>>On 1/28/16, 1:09 PM, "Jan Wieck" <j...@wi3ck.info> wrote:
>>
>>>On 01/28/2016 11:11 AM, Tignor, Tom wrote:
>>>>
>>>>Output below. They seem to be replicating normally, except for the
>>>>sl_log
>>>> growth.
>>>
>>>Indeed. Is there anything in the slon logs for those nodes that says why
>>>it doesn't finish the log switch?
>>>
>>>Connect to the database as a the slony user.
>>>
>>>To check if a log switch is indeed in progress, do
>>>
>>> SELECT last_value FROM _ams_cluster.sl_log_status;
>>>
>>>It should be either 2 or 3. If it is 0 or 1, no log switch is in
>>>progress and you can start one with
>>>
>>> SELECCT _ams_cluster.logswitch_start();
>>>
>>>If it is 2 or 3, then you can do
>>>
>>> SELECT _ams_cluster.logswitch_finish();
>>>
>>>All these operations are harmless and will only do what is safely
>>>possible. Look at the code of logswitch_finish() to find out how it
>>>determines if the current log switch can be finished. In short, the
>>>cleanup thread is removing events from sl_event that have been confirmed
>>>by all nodes in the cluster. The function logswitch_finish() looks if
>>>there is anything left in sl_event, that belonged to that old log. If so
>>>it will not finish. Running those queries manually you can find out what
>>>that event is that is preventing the switch to finish.
>>>
>>>
>>>
>>>>
>>>>
>>>> a...@ams-repl2.ams.netmgmt:~$ /a/third-party/postgresql/bin/psql -U
>>>> ams_slony -d ams -c 'select * from _ams_cluster.sl_status'
>>>>   st_origin | st_received | st_last_event |   st_last_event_ts
>>>>  |
>>>> st_last_received |  st_last_received_ts  |
>>>> st_last_received_event_ts   | st_lag_num_events |   st_lag_time
>>>>
>>>>---+-+---+--
>>>>-
>>>>+-
>>>>
>>>>-+---+--
>>>>-
>>>>--
>>>> --+---+-
>>>>   2 |   1 |5000611610 | 2016-01-28
>>>>16:06:37.343826+00 |
>>>>5000611610 | 201

Re: [Slony1-general] Replication inexplicably stops

2016-01-29 Thread Jan Wieck
'f'
> 2016-01-28 17:41:00 AmÚr. du Sud occid. WARN   remoteWorker_wakeup: node
> 1 - no
> worker thread
> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG enableSubscription: sub_set=1
> 2016-01-28 17:41:00 AmÚr. du Sud occid. WARN   remoteWorker_wakeup: node
> 1 - no
> worker thread
> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: last local event
> sequence =
>   5000462590
> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG main: configuration
> complete - st
> arting threads
> 2016-01-28 17:41:00 AmÚr. du Sud occid. INFO   localListenThread: thread
> starts
> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG version for "dbname =
> Securithor2
>user = slonyuser password = securiTHOR971 port = 6234" is 90310
> NOTICE:  Slony-I: cleanup stale sl_nodelock entry for pid=5188
> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG enableNode: no_id=1
> 2016-01-28 17:41:00 AmÚr. du Sud occid. INFO   remoteWorkerThread_1:
> thread star
> ts
> 2016-01-28 17:41:00 AmÚr. du Sud occid. INFO   remoteListenThread_1:
> thread star
> ts
> 2016-01-28 17:41:00 AmÚr. du Sud occid. INFO   main: running scheduler
> mainloop
> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG cleanupThread: thread starts
> 2016-01-28 17:41:00 AmÚr. du Sud occid. INFO   syncThread: thread starts
> 2016-01-28 17:41:00 AmÚr. du Sud occid. INFO   monitorThread: thread starts
> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG version for "dbname =
> Securithor2
>user = slonyuser password = securiTHOR971 port = 6234" is 90310
> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG remoteWorkerThread_1:
> update prov
> ider configuration
> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG remoteWorkerThread_1:
> added activ
> e set 1 to provider 1
> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG version for
> "dbname=Securithor2 h
> ost=192.168.1.50 user=slonyuser password = securiTHOR971  port = 6234"
> is 90306
> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG version for "dbname =
> Securithor2
>user = slonyuser password = securiTHOR971 port = 6234" is 90310
> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG cleanupThread: bias = 60
> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG version for "dbname =
> Securithor2
>user = slonyuser password = securiTHOR971 port = 6234" is 90310
> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG version for "dbname =
> Securithor2
>user = slonyuser password = securiTHOR971 port = 6234" is 90310
> 2016-01-28 17:41:00 AmÚr. du Sud occid. CONFIG version for
> "dbname=Securithor2 h
> ost=192.168.1.50 user=slonyuser password = securiTHOR971  port = 6234"
> is 90306
> 2016-01-28 17:41:00 AmÚr. du Sud occid. INFO   remoteWorkerThread_1:
> syncing set
>   1 with 59 table(s) from provider 1
>
>
>
>
> It gets stuck at "syncing set 1 with 59 table(s) from provider 1" (the
> last line) forever with the occasional messages that says something
> about cleaning(threadcleaning I thing).
>
>
> Checking the postgres logs, I see lots of:
>
> 2016-01-28 17:33:07 AST LOG:  n'a pas pu recevoir les données du client
> : unrecognized winsock error 10061
>
> Which translates to:
>
> 2016-01-28 17:33:07 AST LOG:  was not able to receive the data from the
> client : unrecognized winsock error 10061
>
> I'm able to connect to the main db from the replicated machine no
> problem. I have no idea how this error 10061 is caused.

Winsock error 10061 is WSAECONNREFUSED

 Connection refused.

 No connection could be made because the target computer actively
 refused it. This usually results from trying to connect to a
 service that is inactive on the foreign host—that is, one with no
 server application running.

This might be a firewall issue. Can you use some network sniffer to find 
out what is happening on the TCP/IP level between the two machines?


Regards, Jan

-- 
Jan Wieck
Senior Software Engineer
http://slony.info
___
Slony1-general mailing list
Slony1-general@lists.slony.info
http://lists.slony.info/mailman/listinfo/slony1-general


Re: [Slony1-general] Slony-I: log switch to sl_log_2 still in progress - sl_log_1 not truncated

2016-01-28 Thread Jan Wieck
-28 13:18:34.487192+00 | f   | 
> DLE>
>
>   16393 | ams |5041 |   213867 | ams_slony  |
> slon.origin_2_provider_2  | 80.67.75.105   | |
> 36402 | 2016-01-22 01:55:22.964462+00 |   |
> 2016-01-28 13:18:34.519066+00 | f   | 
> DLE>
>
>   16393 | ams |6694 |   213867 | ams_slony  |
> slon.node_2_listen| 60.254.150.133 | |
> 61795 | 2016-01-22 01:59:12.095052+00 |   |
> 2016-01-28 13:18:27.928384+00 | f   | 
> DLE>
>
>   16393 | ams |4456 |   213867 | ams_slony  |
> slon.node_2_listen| 72.246.50.22   | |
> 51238 | 2016-01-22 01:54:21.481355+00 |   |
> 2016-01-28 13:18:36.766973+00 | f   | 
> DLE>
>
>   16393 | ams |4457 |   213867 | ams_slony  |
> slon.node_2_listen| 80.67.75.105   | |
> 36333 | 2016-01-22 01:54:21.500456+00 |   |
> 2016-01-28 13:18:36.204482+00 | f   | 
> DLE>
>
>   16393 | ams |4428 |   213867 | ams_slony  |
> slon.local_monitor|| |
> -1 | 2016-01-22 01:54:18.977015+00 |   |
> 2016-01-28 13:18:36.652567+00 | f   | 
> DLE>
>
>   16393 | ams |4427 |   213867 | ams_slony  | slon.local_sync
>  || |  -1 | 2016-01-22
> 01:54:18.976932+00 |   | 2016-01-28
> 13:18:36.151998+00 | f   | 
> DLE>
>
>   16393 | ams |4426 |   213867 | ams_slony  |
> slon.local_cleanup|| |
> -1 | 2016-01-22 01:54:18.976842+00 |   |
> 2016-01-28 13:12:12.582921+00 | f   | 
> DLE>
>
>   16393 | ams |4425 |   213867 | ams_slony  |
> slon.remoteWorkerThread_4 || |
> -1 | 2016-01-22 01:54:18.976783+00 |   |
> 2016-01-28 13:18:33.99715+00  | f   | 
> DLE>
>
>   16393 | ams |4420 |   213867 | ams_slony  |
> slon.remoteWorkerThread_1 || |
> -1 | 2016-01-22 01:54:18.976548+00 |   |
> 2016-01-28 13:18:33.561531+00 | f   | 
> DLE>
>
>   16393 | ams |4419 |   213867 | ams_slony  |
> slon.remoteWorkerThread_3 || |
> -1 | 2016-01-22 01:54:18.97647+00  |   |
> 2016-01-28 13:18:34.808907+00 | f   | 
> DLE>
>
>   16393 | ams |4413 |   213867 | ams_slony  | slon.local_listen
>  || |  -1 | 2016-01-22
> 01:54:18.965568+00 |   | 2016-01-28
> 13:18:37.096159+00 | f   | 
> DLE>
>
> (21 rows)
>
>
> ams=#
>
>
>
> Tom:-)
>
>
>
>
> ___
> Slony1-general mailing list
> Slony1-general@lists.slony.info
> http://lists.slony.info/mailman/listinfo/slony1-general
>


-- 
Jan Wieck
Senior Software Engineer
http://slony.info
___
Slony1-general mailing list
Slony1-general@lists.slony.info
http://lists.slony.info/mailman/listinfo/slony1-general


Re: [Slony1-general] Slony-I: log switch to sl_log_2 still in progress - sl_log_1 not truncated

2016-01-28 Thread Jan Wieck
On 01/28/2016 11:11 AM, Tignor, Tom wrote:
>
>   Output below. They seem to be replicating normally, except for the 
> sl_log
> growth.

Indeed. Is there anything in the slon logs for those nodes that says why 
it doesn't finish the log switch?

Connect to the database as a the slony user.

To check if a log switch is indeed in progress, do

 SELECT last_value FROM _ams_cluster.sl_log_status;

It should be either 2 or 3. If it is 0 or 1, no log switch is in 
progress and you can start one with

 SELECCT _ams_cluster.logswitch_start();

If it is 2 or 3, then you can do

 SELECT _ams_cluster.logswitch_finish();

All these operations are harmless and will only do what is safely 
possible. Look at the code of logswitch_finish() to find out how it 
determines if the current log switch can be finished. In short, the 
cleanup thread is removing events from sl_event that have been confirmed 
by all nodes in the cluster. The function logswitch_finish() looks if 
there is anything left in sl_event, that belonged to that old log. If so 
it will not finish. Running those queries manually you can find out what 
that event is that is preventing the switch to finish.



>
>
> a...@ams-repl2.ams.netmgmt:~$ /a/third-party/postgresql/bin/psql -U
> ams_slony -d ams -c 'select * from _ams_cluster.sl_status'
>   st_origin | st_received | st_last_event |   st_last_event_ts|
> st_last_received |  st_last_received_ts  |
> st_last_received_event_ts   | st_lag_num_events |   st_lag_time
> ---+-+---+---+-
> -+---+-
> --+---+-
>   2 |   1 |5000611610 | 2016-01-28 16:06:37.343826+00 |
>5000611610 | 2016-01-28 16:06:38.843562+00 | 2016-01-28
> 16:06:37.343826+00 | 0 | 00:00:09.201996
>   2 |   3 |5000611610 | 2016-01-28 16:06:37.343826+00 |
>5000611609 | 2016-01-28 16:06:29.851545+00 | 2016-01-28
> 16:06:27.341894+00 | 1 | 00:00:19.203928
>   2 |   4 |5000611610 | 2016-01-28 16:06:37.343826+00 |
>5000611610 | 2016-01-28 16:06:38.710974+00 | 2016-01-28
> 16:06:37.343826+00 | 0 | 00:00:09.201996
> (3 rows)
>
>
> a...@ams-repl3.lga.netmgmt:~$  /a/third-party/postgresql/bin/psql -U
> ams_slony -d ams -c 'select * from _ams_cluster.sl_status'
>   st_origin | st_received | st_last_event |   st_last_event_ts|
> st_last_received |  st_last_received_ts  |
> st_last_received_event_ts   | st_lag_num_events |   st_lag_time
> ---+-+---+---+-
> -+---+-
> --+---+-
>   3 |   4 |5000654642 | 2016-01-28 16:07:05.493455+00 |
>5000654642 | 2016-01-28 16:07:06.486539+00 | 2016-01-28
> 16:07:05.493455+00 | 0 | 00:00:08.522529
>   3 |   1 |5000654642 | 2016-01-28 16:07:05.493455+00 |
>5000654642 | 2016-01-28 16:07:08.040292+00 | 2016-01-28
> 16:07:05.493455+00 | 0 | 00:00:08.522529
>   3 |   2 |5000654642 | 2016-01-28 16:07:05.493455+00 |
>5000654642 | 2016-01-28 16:07:08.472049+00 | 2016-01-28
> 16:07:05.493455+00 | 0 | 00:00:08.522529
> (3 rows)
>
>
> a...@ams-repl4.blr.netmgmt:~$  /a/third-party/postgresql/bin/psql -U
> ams_slony -d ams -c 'select * from _ams_cluster.sl_status'
>   st_origin | st_received | st_last_event |   st_last_event_ts|
> st_last_received |  st_last_received_ts  |
> st_last_received_event_ts   | st_lag_num_events |   st_lag_time
> ---+-+---+---+-
> -+---+-
> --+---+-
>   4 |   3 |5000637483 | 2016-01-28 16:07:32.698809+00 |
>5000637482 | 2016-01-28 16:07:28.731404+00 | 2016-01-28
> 16:07:22.695826+00 | 1 | 00:00:19.077657
>   4 |   1 |5000637483 | 2016-01-28 16:07:32.698809+00 |
>5000637482 | 2016-01-28 16:07:24.839978+00 | 2016-01-28
> 16:07:22.695826+00 | 1 | 00:00:19.077657
>   4 |   2 |5000637483 | 2016-01-28 16:07:32.698809+00 |
>    5000637482 | 2016-01-28 16:07:22.926411+00 | 2016-01-28
> 16:07:22.695826+00 | 1 | 00:00:19.077657
> (3 rows)
>
>
>
>   Tom:-)
>
>
>
> On 1/28/16, 10:38 AM, "Jan Wieck" <j...@wi3ck.info> wrote:
>
>>On 01/28/2016

Re: [Slony1-general] Slony with Amazon's RDS service?

2015-12-21 Thread Jan Wieck
On 12/21/2015 05:44 PM, David Fetter wrote:
> On Fri, Dec 18, 2015 at 11:45:04AM -0300, "Gonzalo Vásquez @ Waypoint" wrote:
>> Is it actually possible to somehow use Slony with Amazon’s RDS
>> service? I’m aware that there’s no such thing as a host that you can
>> log into to make Slony’s configurations, but perhaps it can be done
>> remotely from a side server or something like that?
>
> You would need to get access to RDS's catalog that you don't have, so
> my best guess is, "no." That's the technical side.
>
> On the business side, Amazon has very strong incentives to block
> anything that might let you migrate off their services easily, so my
> best guess is that any attempt to do this would be treated by Amazon
> as an attack.

On the technical side it would make a lot of sense to actually support 
the strongest replication system, that has been though the test of time 
for cross platform, cross version upgrade for a decade now.

But then again, that would mean to invest in people and that means 
becoming dependent on individuals. So on the business side it doesn't 
make that much sense. Hire and fire businesses like Amazon cannot depend 
on skilled individuals.


Regards, Jan

-- 
Jan Wieck
Senior Software Engineer
http://slony.info
___
Slony1-general mailing list
Slony1-general@lists.slony.info
http://lists.slony.info/mailman/listinfo/slony1-general


Re: [Slony1-general] remote listener serializability

2015-11-20 Thread Jan Wieck
On 11/20/2015 08:45 AM, Steve Singer wrote:
> On 11/19/2015 01:09 PM, Tignor, Tom wrote:
>>  A general question for the group: if we would consider a change like 
>> this
>> (as a runtime option or otherwise), what’s the correct way to move it
>> forward? Should I file a bug? Are there specific tests or analysis which
>> should be performed?
> 
> I would use bug 336 for this.
> My gut is to make this a runtime option since I am not confident that
> downgrading the isolation level won't impact someones workload, having
> said that I am open to being convinced otherwise.  I would also be
> inclined to leave the default value unchanged for 2.2.x and then maybe
> consider changing the default in 2.3.   If we actually any concrete
> plans for a 2.3 version I would say just leave the change for 2.3 but I
> don't see a 2.3 release happening soon and I don't want to change the
> default behaviour in a minor release.
> 
> 
> Generally when we make a change like this I try to do lots of runs
> through the disorder/clustertest suite that doesn't cover all types of
> workloads.  Have you tried this change on your workload? Does it
> actually help things?
> 
> It would be great if Jan and Chris were to comment on this thread

Without taking SYNC snapshots in a SERIALIZABLE transaction I believe
that a Slony-I replica could suffer the same inconsistency dangers that
a pg_dump without --serializable-deferrable can suffer. Namely a replica
would not be usable as a source for reporting. From the 9.4 pg_dump docs:

"This option is not beneficial for a dump which is intended only for
disaster recovery. It could be useful for a dump used to load a copy of
the database for reporting or other read-only load sharing while the
original database continues to be updated. Without it the dump may
reflect a state which is not consistent with any serial execution of the
transactions eventually committed. For example, if batch processing
techniques are used, a batch may show as closed in the dump without all
of the items which are in the batch appearing."

Changing the default isolation levels(s) may therefore change, what a
replica can safely be used for and I believe that creating reports is
one of the major use cases. Using options with big, bold, red, flashing
warnings in the documentation would be the only way to go.


Regards, Jan



> 
>>
>>  Tom:-)
>>
>>
>> On 11/18/15, 10:35 AM, "Greg Sabino Mullane" <g...@endpoint.com> wrote:
>>
>>> On Wed, Nov 18, 2015 at 02:26:15PM +, Tom Tignor wrote:
>>> ...
>>>> Sorry for the delay getting back. Inspired by your questions, I¹ve been
>>>> reading up on SSI, the Cahill paper and slony1 and postgres code.
>>> ...
>>>
>>> It should be pointed out that 9.1 goes EOL (End of Life) in less than
>>> a year (Sep 2016), and transaction handling has changed a *lot* since
>>> then,
>>> so any changes that core Slony makes may not even work for you.
>>>
>>> (FWIW, I think dropping the isolation level in this particular
>>> instance seems safe, however.)
>>>
>>> --
>>> Greg Sabino Mullane g...@endpoint.com
>>> End Point Corporation
>>> PGP Key: 0x14964AC8
>>
>> _______
>> Slony1-general mailing list
>> Slony1-general@lists.slony.info
>> http://lists.slony.info/mailman/listinfo/slony1-general
>>
> 
> ___
> Slony1-general mailing list
> Slony1-general@lists.slony.info
> http://lists.slony.info/mailman/listinfo/slony1-general
> 


-- 
Jan Wieck
Senior Software Engineer
http://slony.info
___
Slony1-general mailing list
Slony1-general@lists.slony.info
http://lists.slony.info/mailman/listinfo/slony1-general


Re: [Slony1-general] Network connection from slaves to the master

2015-11-10 Thread Jan Wieck
On 11/10/2015 08:03 AM, TOINEL, Ludovic wrote:
> Thanks Andrew,
>
> We are not allowed to have network connection from the slaves to the master 
> (for security constraints). Only master can communicate with slaves.
> We need database on slaves with mix replicates tables and read/write tables.

I presume that this is a restriction on your routers or gateways and 
that the initiating SYN packet for a new TCP connection cannot come from 
what you envision to be the Slave.

If that is correct, you can run all the slon daemons on what you 
consider the Master. All TCP connections are then initiated from the 
Master. This has little to no effect on how Slony functions.


Regards, Jan


>
> The solution could be maybe that solution using a slony master has an Hot 
> standby of a master protected somewhere ?
>
> [slony slaves] <-> [slony master - Standby node] <(log 
> shipping)--|firewall|-- [master protected somewhere]
>
> Do you think this solution can work with slony ?
>
> Regards,
>
> Ludovic Toinel
>
> -Message d'origine-
> De : slony1-general-boun...@lists.slony.info 
> [mailto:slony1-general-boun...@lists.slony.info] De la part de Andrew Sullivan
> Envoyé : mardi 10 novembre 2015 12:26
> À : slony1-general@lists.slony.info
> Objet : Re: [Slony1-general] Network connection from slaves to the master
>
> On Tue, Nov 10, 2015 at 09:51:29AM +, TOINEL, Ludovic wrote:
>> The network allows only flows from master to slaves.
>>
>> Is there any option that I missed to do that ?
>
> Not really.  In principle you could do this with the log shipping mode, but I 
> don't recall whether doing that on the master was not possible or just a 
> really bad idea.  (You could do this with the built-in standby mechanisms of 
> Postgres, though.
>
> I do wonder why you have it set up this way, however.  Why do you control the 
> flows this way?
>
> A
>
> --
> Andrew Sullivan
> a...@crankycanuck.ca
> ___
> Slony1-general mailing list
> Slony1-general@lists.slony.info
> http://lists.slony.info/mailman/listinfo/slony1-general
> _______
> Slony1-general mailing list
> Slony1-general@lists.slony.info
> http://lists.slony.info/mailman/listinfo/slony1-general
>


-- 
Jan Wieck
Senior Software Engineer
http://slony.info
___
Slony1-general mailing list
Slony1-general@lists.slony.info
http://lists.slony.info/mailman/listinfo/slony1-general


Re: [Slony1-general] Slony Replication Startup Speed

2015-11-03 Thread Jan Wieck
On 10/29/2015 09:49 PM, Sung Hsin Lei wrote:
> Actually, node 2 is correct. I have 2 different clusters. Each cluster
> has node 1 and node 2. Each cluster replicates to 1 db.

What is the reason for this setup? If the two slaves are subscribed to 
different sets of tables, then the same can be achieved by creating 
multiple sets and subscribing the slaves to individual combinations of 
them.


Regards, Jan


>
> On Thu, Oct 29, 2015 at 8:39 PM, Scott Marlowe <scott.marl...@gmail.com
> <mailto:scott.marl...@gmail.com>> wrote:
>
> On Thu, Oct 29, 2015 at 6:06 PM, Sung Hsin Lei <sungh@gmail.com
> <mailto:sungh@gmail.com>> wrote:
> > Hello,
> >
> > I successfully update the main db and one replicated db with the 
> following:
> >
> > cluster name = slony_rep1
> >
> > node 1 admin conninfo = 'dbname = MyDB host = localhost user = slony1
> > password = Ejhfg33EdddsufhErR76 port = 6234';
> > node 2 admin conninfo = 'dbname = MyDB host = 86.88.5.4 user = slony1
> > password = Ejhfg33EdddsufhErR76 port = 6234';
> >
> > EXECUTE SCRIPT
> > (
> > SQL = 'ALTER TABLE operators RENAME COLUMN firstname TO lastname;',
> > EVENT NODE = 1
> > );
> >
> >
> >
> >
> > However, I have 1 main db to 2 replicated db. The second replicated db 
> has
> > the following cluster name and node information:
> >
> > cluster name = slony_rep2
> >
> > node 1 admin conninfo = 'dbname = MyDB host = localhost user = slony2
> > password = Ejhfg33EdddsufhErR76 port = 6234';
> > node 2 admin conninfo = 'dbname = MyDB host = 86.88.5.17 user = slony2
> > password = Ejhfg33EdddsufhErR76 port = 6234';
>
> Shouldn't that be node 3 not node 2?
>
> > Do I need to run slonik twice with different cluster and node indo? That
> > does not seem right. After running the first time, the main db and the 
> first
> > replicated db will be consistent but not the second replicated db. 
> Also, if
> > I run it a second time, wouldn't the main db already be updated hence 
> the
> > sql statements used for the original update will surely fail?
>
> You can just run it on all three at once.
>
> node 1 admin conninfo = 'dbname = MyDB host = localhost user = slony1
> password = Ejhfg33EdddsufhErR76 port = 6234';
> node 2 admin conninfo = 'dbname = MyDB host = 86.88.5.4 user = slony1
> password = Ejhfg33EdddsufhErR76 port = 6234';
> node 3 admin conninfo = 'dbname = MyDB host = 86.88.5.17 user = slony2
> password = Ejhfg33EdddsufhErR76 port = 6234';
>
> EXECUTE SCRIPT
> (
>  SQL = 'ALTER TABLE operators RENAME COLUMN firstname TO lastname;',
>  EVENT NODE = 1
> );
>
>
>
>
> ___
> Slony1-general mailing list
> Slony1-general@lists.slony.info
> http://lists.slony.info/mailman/listinfo/slony1-general
>


-- 
Jan Wieck
Senior Software Engineer
http://slony.info
___
Slony1-general mailing list
Slony1-general@lists.slony.info
http://lists.slony.info/mailman/listinfo/slony1-general


Re: [Slony1-general] upgrade slony master / slave using pg_upgrade

2015-09-29 Thread Jan Wieck
On 09/29/2015 05:23 PM, Mark Steben wrote:
> Good evening
>
> Fairly simple question (I think)
> Can I use the pg_upgrade contrib module to upgrade from postgres 9.2 to
> 9.4 without having to
> re-define and re-subscribe all of the slony replicated data?  My initial
> pg_upgrade tests are yelling at me that some slony libraries are not
> found in the new 9.4 binaries:
>
> Could not load library "$libdir/slony1_funcs"
> ERROR:  could not access file "$libdir/slony1_funcs": No such file or
> directory
>
> Hopefully it is just and issue of copying the slony binaries from 9.2 to
> the same libraries in 9.4 on master and slave, or is a re-creation
> necessary?  (We are running slony 2.2.3)

Not that simple. Slony is quite dependent on version numbers.

All nodes in a Slony cluster must run the exact same Slony version.

Each node can run against a different PostgreSQL version, as long as the 
cluster's Slony version supports that PostgreSQL version. However, the 
shared libs and such on each node must be of that Slony version compiled 
against that PostgreSQL version.

So just copying the 9.2 slony $lib and other content over to the 9.4 lib 
and shared directories isn't going to do it. You need those files to be 
built against the 9.4 tree. This is because of code shift inside the 
PostgreSQL backend. One and the same Slony shared lib function is using 
different backend code to accomplish the same task. There is quite a bit 
of #ifdef stuff dependent on PostgreSQL versions in the Slony shared 
object code.


Regards, Jan


-- 
Jan Wieck
Senior Software Engineer
http://slony.info
___
Slony1-general mailing list
Slony1-general@lists.slony.info
http://lists.slony.info/mailman/listinfo/slony1-general


Re: [Slony1-general] Cloning an origin?

2015-07-29 Thread Jan Wieck
On 07/29/2015 09:34 AM, David Fetter wrote:
 On Tue, Jul 28, 2015 at 06:51:44PM -0400, Jan Wieck wrote:
 On 07/25/2015 12:31 AM, David Fetter wrote:
 Folks,
 
 While in the best of all possible worlds, we'd have planned out a
 replication strategy before we get tables whose initial sync via
 SUBSCRIBE SET will never finish, we aren't always given that much
 ability to plan that soon.

 One thing I forgot to ask initially. What do you mean by will never
 finish? Is that just being facetious about it will take a long
 time, is it that you don't have the disk storage to swallow the
 back log that will accumulate or is it outdated knowledge from the
 times when Slony did actually slow down with accumulation of
 backlog?

 As far as I could tell, backlog was accumulating faster than the
 initial sync was happening.

If that is the case with 2.2, then there is a good chance that the slave 
won't be able to keep up anyway.

Note that the log switching on the master is severely hindered until the 
slave is actually caught up. So you will see a sawtooth pattern in the 
total amount of sl_log. I can explain that in more detail if needed.

Assuming a more or less steady rate of update activity on the master, 
the rate at which SYNC events get replicated is a better indicator. If 
master-SYNC event appear on the slave faster than they are generated on 
the master, it is catching up.


Regards, Jan

-- 
Jan Wieck
Senior Software Engineer
http://slony.info
___
Slony1-general mailing list
Slony1-general@lists.slony.info
http://lists.slony.info/mailman/listinfo/slony1-general


Re: [Slony1-general] Cloning an origin?

2015-07-28 Thread Jan Wieck
On 07/25/2015 12:31 AM, David Fetter wrote:
 Folks,

 While in the best of all possible worlds, we'd have planned out a
 replication strategy before we get tables whose initial sync via
 SUBSCRIBE SET will never finish, we aren't always given that much
 ability to plan that soon.

One thing I forgot to ask initially. What do you mean by will never 
finish? Is that just being facetious about it will take a long time, 
is it that you don't have the disk storage to swallow the back log that 
will accumulate or is it outdated knowledge from the times when Slony 
did actually slow down with accumulation of backlog?


Regards, Jan

-- 
Jan Wieck
Senior Software Engineer
http://slony.info
___
Slony1-general mailing list
Slony1-general@lists.slony.info
http://lists.slony.info/mailman/listinfo/slony1-general


Re: [Slony1-general] Cloning an origin?

2015-07-28 Thread Jan Wieck
On 07/27/2015 07:49 PM, Steve Singer wrote:
 Or am I still missing something?

No, I missed the part that the transactions still in progress at the 
next SYNC after backup start will be handled by precisely that.

So the logic then is to take the last master-SYNC found on the restored 
slave, increase maxxid in it to the highest txid found in sl_log_*, 
remove all txids found in sl_log_* from the xip vector and use that as 
sl_setsync. Add a sl_confirm entry for that master-SYNC reflecting that 
the slave is caught up to/beyond that point and that the next SYNC to 
process is the one following the backup start.

There is only one detail left then. While we are doing all that we need 
the postmaster for the restored slave up and running. But we need to 
make sure that the slon of the master (or any node for that matter) does 
not connect to the slave and get confused by seeing the wrong data. That 
may be accomplished by changing the local node ID last so that every 
slon would throw an error because they are connecting to the wrong node.


Jan

-- 
Jan Wieck
Senior Software Engineer
http://slony.info
___
Slony1-general mailing list
Slony1-general@lists.slony.info
http://lists.slony.info/mailman/listinfo/slony1-general


Re: [Slony1-general] Cloning an origin?

2015-07-27 Thread Jan Wieck
On 07/26/2015 10:48 PM, Steve Singer wrote:
 On 07/26/2015 08:01 PM, Jan Wieck wrote:
 On 07/26/2015 07:35 PM, Jan Wieck wrote:
 That said, pg_dump isn't that much faster than Slony's copy_set() so
 we'd need to find a way to reconstruct an sl_setsync entry from
 something like a binary base backup. That is not trivial.

 Here is a wild idea that I am going to test, but someone may throw a
 wrench into it while I'm doing that:

 Let us assume that the application can be stopped and the database
 brought into so called single user mode for a short period of time.
 During that maintenance window we do the following:

 * Shutdown the application
 * Create the cluster with two nodes and paths via slonik script.
 * Shutdown the slon for the slave.
 * Create a SYNC event on the master.
 * Execute a SUBSCRIBE SET ... OMIT COPY.
 * Create an LVM snapshot of $PGDATA (+ pg_xlog ...)
 * Resume the application

 That should take no longer than a few minutes. Properly scripted it
 could be in the second range. We now could

 * Use LVM snapshot(s) to plow over the slave's $PGDATA and so on.
 * Modify the slave's DB to correct the local node ID and fake an
   sl_setsync entry reflecting the SYNC created on the master.
 * Fire up the slave's slon.


 Comments?


 I think that would work.

 Is there  a way we could do this without an application outage.

 It seems to me that if you take a backup of the master wih pg_basebackup
 or through any other consistent method of a binary backup you could look
 at the restored instance and say

 * Any transactions that show as committed in the last SYNC that shows up
 in the restored sl_event table are committed
 * Any transactions that were not included in the above, but have actions
 in sl_log_1 or sl_log_2 on the restored instance were committed at the
 time of the backup (otherwise the rows inserted by those transactions
 couldn't be in sl_log)
 * Any transactions that were committed by the next SYNC on the origin
 but don't show up in the above list can be pulled as part of the first
 real SYNC to be processed, ie you would show those as 'in progress' (or
 not yet started) for fake sl_setsync value you create.

I think this is insufficient. Transactions that were in progress when 
the backup started are not guaranteed to be committed by the next SYNC. 
In fact they could still be in progress many SYNCs later.

What probably could work to produce that snapshot is to wait for the 
sl_log, that was active (inserted into) at backup time to be closed - 
IOW a log switch to complete. All xids found in that log on the master 
EXCEPT all xids in that log on the restored slave are to be considered 
in progress at backup time.

This should build an artificial snapshot that tells us where to start. 
Any SYNC after the log switch completed would make for a suitable next 
SYNC to leap to. So we would record this artificial snapshot as a SYNC 
with ev_seqno - 1 of the next SYNC.


Regards, Jan

-- 
Jan Wieck
Senior Software Engineer
http://slony.info
___
Slony1-general mailing list
Slony1-general@lists.slony.info
http://lists.slony.info/mailman/listinfo/slony1-general


Re: [Slony1-general] Cloning an origin?

2015-07-26 Thread Jan Wieck
On 07/26/2015 07:35 PM, Jan Wieck wrote:
 That said, pg_dump isn't that much faster than Slony's copy_set() so
 we'd need to find a way to reconstruct an sl_setsync entry from
 something like a binary base backup. That is not trivial.

Here is a wild idea that I am going to test, but someone may throw a 
wrench into it while I'm doing that:

Let us assume that the application can be stopped and the database 
brought into so called single user mode for a short period of time. 
During that maintenance window we do the following:

   * Shutdown the application
   * Create the cluster with two nodes and paths via slonik script.
   * Shutdown the slon for the slave.
   * Create a SYNC event on the master.
   * Execute a SUBSCRIBE SET ... OMIT COPY.
   * Create an LVM snapshot of $PGDATA (+ pg_xlog ...)
   * Resume the application

That should take no longer than a few minutes. Properly scripted it 
could be in the second range. We now could

   * Use LVM snapshot(s) to plow over the slave's $PGDATA and so on.
   * Modify the slave's DB to correct the local node ID and fake an
 sl_setsync entry reflecting the SYNC created on the master.
   * Fire up the slave's slon.


Comments?


-- 
Jan Wieck
Senior Software Engineer
http://slony.info
___
Slony1-general mailing list
Slony1-general@lists.slony.info
http://lists.slony.info/mailman/listinfo/slony1-general


Re: [Slony1-general] too much work !

2015-01-15 Thread Jan Wieck
On 01/13/2015 09:45 AM, Sebastien Marchand wrote:
 Yeah, 17 cluster name if you prefer.

 I start one slon by one cluster.
 Samples :

 slon -s 200 -t 6 -g 50 -o 6 -c 0 -d 1 -f /home/scripts/slon.conf 
 repli_nat dbname=DB2 host=127.0.0.1 port=5432 user=slony password=123
 slon -s 200 -t 6 -g 50 -o 6 -c 0 -d 1 -f /home/scripts/slon.conf 
 repli_lc3 dbname=DB2 host=127.0.0.1 port=5432 user=slony password=123
 slon -s 200 -t 6 -g 50 -o 6 -c 0 -d 1 -f /home/scripts/slon.conf 
 repli_lc2 dbname=DB2 host=127.0.0.1 port=5432 user=slony password=123

This looks as if you created 17 Slony clusters instead of one cluster 
with 18 nodes.


Jan




 -Message d'origine-
 De : Stéphane Schildknecht [mailto:stephane.schildkne...@postgres.fr]
 Envoyé : mardi 13 janvier 2015 15:22
 À : Sebastien Marchand; 'Glyn Astill'; 'Vick Khera'
 Cc : 'slony'
 Objet : Re: [Slony1-general] too much work !

 On 12/01/2015 11:38, Sebastien Marchand wrote:
 Hi,

 18 nodes, wan network.
 2 replications ( one with 1 set and other with 2 sets ( 2 directions )
 ) First replication from server 1 to n servers ( same schema, 37
 tables )
 1 slon on server 1 and 1 slon by remote server Second replication :
 set 1 : server A - server 1 ( 90 replicated tables ) set 2 : server A
 - server 1 ( 1 replicated tables ) do the same with 17 other servers
 ( one schema by server )
 17 slons on server 1 and 1 slon by  remote server database 5 gB.

 I don't know if that will help...

 Seems to me some information is missing. Distinction on databases, for 
 instance.

 I guess, the 2 replications are not correlated.They just share same servers.

 In fact, the second replication is not 1 replication, it is 17 different 
 replications. But, do they all end in the same database on server A ?

 --
 Stéphane Schildknecht
 Contact régional PostgreSQL pour l'Europe francophone Loxodata - Conseil, 
 expertise et formations


 ___
 Slony1-general mailing list
 Slony1-general@lists.slony.info
 http://lists.slony.info/mailman/listinfo/slony1-general



-- 
Jan Wieck
Senior Software Engineer
http://slony.info
___
Slony1-general mailing list
Slony1-general@lists.slony.info
http://lists.slony.info/mailman/listinfo/slony1-general


Re: [Slony1-general] PGRES_FATAL_ERROR ERROR: could not access file $libdir/slony1_funcs.2.2.3: No such file or directory

2014-12-17 Thread Jan Wieck
On Dec 16, 2014 1:35 AM, Tory M Blue tmb...@gmail.com wrote:

 Trying a production upgrade, after upgrade 3 different environments and
I'm getting this error.

 PGRES_FATAL_ERROR ERROR:  could not access file
$libdir/slony1_funcs.2.2.3: No such file or directory

 The files are there, anyway to force this? even tried to create a export
libdir with no success.

Are the files there on all the involved nodes?

Jan

--
Jan Wieck
Senior Software Engineer
http://slony.info
___
Slony1-general mailing list
Slony1-general@lists.slony.info
http://lists.slony.info/mailman/listinfo/slony1-general


Re: [Slony1-general] Duplicate key while merging temporary to main set

2014-12-16 Thread Jan Wieck
On 12/16/2014 05:32 AM, Carlos Henrique Reimer wrote:

 8357_isarq.wm_bco_pla;  PGRES_FATAL_ERROR ERROR:  could not create
 unique index pk_wm_bco_pla
 DETAIL:  Key (bco_cod, pla_cod)=(399, 5056110) is duplicated.
 CONTEXT:  SQL statement reindex table 8357_isarq.wm_bco_pla
 PL/pgSQL function finishtableaftercopy(integer) line 26 at EXECUTE statement
 2014-12-16 07:08:25 BRST WARN   remoteWorkerThread_1: data copy for set
 999 failed 108 times - sleep 60 seconds

 The error indicates 8357_isarq.wm_bco_pla has duplicated rows for
 primary key  (bco_cod, pla_cod)=(399, 5056110) but when I query the
 table using this pk I got only one row:

 select * from 8357_isarq.wm_bco_pla where bco_cod=399 and pla_cod =
 '5056110';
   bco_cod | pla_cod |pla_des| ativo |
 conta_pai | pla_itlistserv | pla_ctacosif
 -+-+---+---+---++--
   399 | 5056110 | RENDAS TRANSACOES VISA ELETRON - HBBR | S |
 5056004   |   1501 | 71799003
 (1 row)

This means that a sequential scan (done by the COPY) produces multiple 
rows with that primary key, while only one of them has an index entry. 
As Andrew pointed out, you have index corruption on the source database. 
I bet a REINDEX of that table on the source will fail too.

 Slony is trying to merge the temporary set to the main set every 60
 seconds and getting this error. I do not want to loose three days of
 replication processing.

No, Slony is still processing the SUBSCRIBE SET for the small temporary 
set (and repeatedly failing on that). The MERGE SET is queued behind and 
not touched yet.

 Is there anything I can do to fix this error or at least can I remove
 this table from the temporary set and work on this issue afterwards?

Reading your follow up, you only got a single duplicate on this one. 
However, there is a high chance that your DELETE actually deleted the 
wrong row version. Since you are running a really outdated version of 
PostgreSQL I would not be surprised if some bug made an outdated version 
of that row visible again. The index entry for that PK would be pointing 
at the last row version, which you now deleted. The replica would now 
contain the old row version again.

Check if you have a row with that PK in the replica and what that 
actually looks like.


Regards, Jan


-- 
Jan Wieck
Senior Software Engineer
http://slony.info
___
Slony1-general mailing list
Slony1-general@lists.slony.info
http://lists.slony.info/mailman/listinfo/slony1-general


Re: [Slony1-general] Slony-I 2.2.3 fails on execute script

2014-12-05 Thread Jan Wieck
On 12/05/2014 03:33 PM, Waldo Nell wrote:
 I have a working Slony-I setup with one master and one slave.  I tried to run 
 a simple alter table add column statement via execute script.  This always 
 worked for me, however today it failed.

Please stop using terms like master and slave in a Slony context.

There is no master or slave. There are origins of sets and 
subscribers of sets. The latter are sometimes referred to as replica.


Thanks,
Jan


-- 
Jan Wieck
Senior Software Engineer
http://slony.info
___
Slony1-general mailing list
Slony1-general@lists.slony.info
http://lists.slony.info/mailman/listinfo/slony1-general


Re: [Slony1-general] RDS PostgreSQL?

2014-12-02 Thread Jan Wieck
On 12/02/2014 05:04 PM, David Fetter wrote:
 On Tue, Dec 02, 2014 at 02:07:08PM -0500, Jim Mlodgenski wrote:
 On Tue, Dec 2, 2014 at 1:18 PM, David Fetter da...@fetter.org wrote:
  Folks,
 
  Is there any way to make an Amazon RDS PostgreSQL database a Slony
  node?
 

 I don't believe they've deployed the Slony libraries on RDS yet, but
 you can use Londiste or Bucardo. They just recently announced
 support for both of them.

 It's great to know there are logical replication systems available.

I appears that the people, working for Amazon in India, don't have what 
it takes to master Slony. They do well on MySQL, but that's about it.


Jan


-- 
Jan Wieck
Senior Software Engineer
http://slony.info
___
Slony1-general mailing list
Slony1-general@lists.slony.info
http://lists.slony.info/mailman/listinfo/slony1-general


Re: [Slony1-general] Lag time increasing but there are no events

2014-10-17 Thread Jan Wieck
On 10/16/2014 11:48 AM, Glyn Astill wrote:
 From: Glyn Astill glynast...@yahoo.co.uk

 To: Dave Cramer davecra...@gmail.com
 Cc: slony slony1-general@lists.slony.info
 Sent: Thursday, 16 October 2014, 16:26
 Subject: Re: [Slony1-general] Lag time increasing but there are no events


  From: Dave Cramer davecra...@gmail.com
 To: Glyn Astill glynast...@yahoo.co.uk
 Cc: slony slony1-general@lists.slony.info
 Sent: Thursday, 16 October 2014, 15:41
 Subject: Re: [Slony1-general] Lag time increasing but there are no events



 Actually I think it is because a pg_dump of the db is going on.


 Can you dump a slave ?


 Do you have to exclude the slony clusters (which would make sense)?


 Yeah from past experience the locks taken by pg_dump cause replication to 
 lag. I
 normally use the -N switch in pg_dump to exclude the schema.  For pg_restore 
 I
 usually create a TOC with the -l option and pipe it to grep to filter out the
 references to the slony schema that get dumped for the triggers, and then -L
 option to use it with my restore.

 Also I assumed you were using 2.0+ there, with 1.2 there's some slight 
 fiddling done in pg_catalog to disable triggers, from memory 
 pg_trigger.tgrelid is pointed to an index rather than the table or something 
 like that.  I used to dump out with the slony schema, and run uninstallnode() 
 and drop schema cascade against the restored database.

Dumping a replica won't work 2.0 because the dump won't contain all the 
information. With recent PG versions it will actually error out. With 
2.0+ it does work, so with 2.0+ it is possible to take the backup from a 
replica.

Dave: A pg_dump can very well stop the creation of new SYNC events since 
slon needs a brief exclusive lock on the sl_event table. If that is the 
case, the st_last_event in sl_status will not be advancing as if no slon 
would be running, which would explain the increase of st_lag_time with 
zero st_lag_num_events.

As suggested, dumping the database with -N to exclude the Slony-I schema 
should do the trick and dumping it has very little value anyway. If you 
ever have to restore from that dump, you'd either have to restore all 
replicas from it as well and rebuild the cluster with OMIT COPY, or 
rebuild the cluster with Slony copying the data. You would need to do 
the latter if replicas aren't full copies of the entire master.


Regards,
Jan

-- 
Jan Wieck
Senior Software Engineer
http://slony.info
___
Slony1-general mailing list
Slony1-general@lists.slony.info
http://lists.slony.info/mailman/listinfo/slony1-general


Re: [Slony1-general] Lag time increasing but there are no events

2014-10-17 Thread Jan Wieck
On 10/17/2014 09:00 AM, Dave Cramer wrote:
 So what's the best way to deal with very large databases. pg_dump just
 doesn't cut it. By the time we are finished dumping the data is stale
 anyway !

 I'm thinking of using PITR instead. Rolling over once a week

basebackup or some combination of pg_start_backup() and rsync will work. 
You can do that against master or replica and do the UNINSTALL NODE in 
case you have to use it.


Jan


 Dave Cramer

 On 17 October 2014 08:51, Jan Wieck j...@wi3ck.info
 mailto:j...@wi3ck.info wrote:

 On 10/16/2014 11:48 AM, Glyn Astill wrote:

 From: Glyn Astill glynast...@yahoo.co.uk
 mailto:glynast...@yahoo.co.uk


 To: Dave Cramer davecra...@gmail.com
 mailto:davecra...@gmail.com
 Cc: slony slony1-general@lists.slony.__info
 mailto:slony1-general@lists.slony.info
 Sent: Thursday, 16 October 2014, 16:26
 Subject: Re: [Slony1-general] Lag time increasing but there
 are no events


   From: Dave Cramer davecra...@gmail.com
 mailto:davecra...@gmail.com
 To: Glyn Astill glynast...@yahoo.co.uk
 mailto:glynast...@yahoo.co.uk
 Cc: slony slony1-general@lists.slony.__info
 mailto:slony1-general@lists.slony.info
 Sent: Thursday, 16 October 2014, 15:41
 Subject: Re: [Slony1-general] Lag time increasing but
 there are no events



 Actually I think it is because a pg_dump of the db is
 going on.


 Can you dump a slave ?


 Do you have to exclude the slony clusters (which would
 make sense)?



 Yeah from past experience the locks taken by pg_dump cause
 replication to lag. I
 normally use the -N switch in pg_dump to exclude the
 schema.  For pg_restore I
 usually create a TOC with the -l option and pipe it to grep
 to filter out the
 references to the slony schema that get dumped for the
 triggers, and then -L
 option to use it with my restore.


 Also I assumed you were using 2.0+ there, with 1.2 there's some
 slight fiddling done in pg_catalog to disable triggers, from
 memory pg_trigger.tgrelid is pointed to an index rather than the
 table or something like that.  I used to dump out with the slony
 schema, and run uninstallnode() and drop schema cascade against
 the restored database.


 Dumping a replica won't work 2.0 because the dump won't contain all
 the information. With recent PG versions it will actually error out.
 With 2.0+ it does work, so with 2.0+ it is possible to take the
 backup from a replica.

 Dave: A pg_dump can very well stop the creation of new SYNC events
 since slon needs a brief exclusive lock on the sl_event table. If
 that is the case, the st_last_event in sl_status will not be
 advancing as if no slon would be running, which would explain the
 increase of st_lag_time with zero st_lag_num_events.

 As suggested, dumping the database with -N to exclude the Slony-I
 schema should do the trick and dumping it has very little value
 anyway. If you ever have to restore from that dump, you'd either
 have to restore all replicas from it as well and rebuild the cluster
 with OMIT COPY, or rebuild the cluster with Slony copying the data.
 You would need to do the latter if replicas aren't full copies of
 the entire master.


 Regards,
 Jan

 --
 Jan Wieck
 Senior Software Engineer
 http://slony.info




-- 
Jan Wieck
Senior Software Engineer
http://slony.info
___
Slony1-general mailing list
Slony1-general@lists.slony.info
http://lists.slony.info/mailman/listinfo/slony1-general


Re: [Slony1-general] Changing master node's IP port

2014-10-14 Thread Jan Wieck
On 10/14/2014 11:57 AM, Glyn Astill wrote:
 From: Granthana Biswas granthana.bis...@gmail.com
To: Glyn Astill glynast...@yahoo.co.uk
Cc: slony1-general@lists.slony.info slony1-general@lists.slony.info
Sent: Tuesday, 14 October 2014, 16:48
Subject: [Slony1-general] Changing master node's IP  port



Hi Glyn,


Yes I had stopped all the slons for every node but I did not DELETE FROM 
_Cluster1.sl_nodelock WHERE nl_nodeid = 1 AND nl_conncnt = 0;



 Well you shouldn't have had to run the delete, that's just to get you going.  
 I assume you're up and running now?

The cleanup procedure for nodelock checks if the backend process, 
holding the lock (if any) is still alive. If I had to guess my guess 
would be that someone/something changed the IP address of the server 
without stopping the slon processes first and that IP address change 
leads to a connection loss without the TCP connections getting RST or 
FIN packets. In that situation it is likely that the database backend 
from the old slon connection is waiting on a blocking read and will only 
notice that the connection is gone after a full TCP keepalive timeout, 
which defaults to several hours.

Terminating the slony related database connections via 
pg_terminate_backend() will make the nodelock cleanup succeed.

In any case it is a good practice to have TCP keepalive settings a lot 
more aggressive on both sides, PostgreSQL and Slony. At my former work 
place we used to set them to 60 seconds idle, then 9 keepalive packets 
in 7 second interval. That will let the connections time out in about 2 
minutes.


Regards, Jan


-- 
Jan Wieck
Senior Software Engineer
http://slony.info
___
Slony1-general mailing list
Slony1-general@lists.slony.info
http://lists.slony.info/mailman/listinfo/slony1-general


Re: [Slony1-general] Error with Slony replication from another slony cluster slave

2014-10-09 Thread Jan Wieck
On 10/09/2014 08:18 AM, Granthana Biswas wrote:
 Hi All,

 I am trying to replicate from another Slony cluster's slave node.

 Cluster1 - replicating from DB1 to - DB2

 Cluster2 - replicating from DB2 to - DB3

 The initial sync up went fine without any errors. There are no errors in
 logs of both the clusters. Also no st_lag_num_events in DB3 or DB2 for
 Cluster2.

 But the data added in DB2 since I started slony Cluster2 is not
 reflecting in DB3.

 Does slony allow replication from another cluster's slony slave? Or did
 I miss something?

This does not work because the logTrigger will only record information 
when firing in session_replication_role 'origin'. When Slony is applying 
changes from DB1 to DB2, the session_replication_role is 'replica', so 
the logTrigger action is suppressed inside the trigger function.

You will have to make DB3 a member of Cluster1 and use DB2 as the data 
provider to achieve this. This has the added benefit that you can easily 
change the data provider of DB3 to DB1 in case you need to perform any 
maintenance on DB2.


Regards,
Jan

-- 
Jan Wieck
Senior Software Engineer
http://slony.info
___
Slony1-general mailing list
Slony1-general@lists.slony.info
http://lists.slony.info/mailman/listinfo/slony1-general


Re: [Slony1-general] Error with Slony replication from another slony cluster slave

2014-10-09 Thread Jan Wieck
On 10/09/2014 05:26 PM, Glyn Astill wrote:
 From: Granthana Biswas granthana.bis...@gmail.com
To: Glyn Astill glynast...@yahoo.co.uk
Sent: Thursday, 9 October 2014, 16:34
Subject: Re: [Slony1-general] Error with Slony replication from another slony 
cluster slave



Hi Glyn,


In my case I have two clusters:


Cluster1 -  replicating from DB1 - DB2
Cluster2 -  replicating from DB1 - DB3


Can I stop Cluster2 and add DB3 to Cluster1 with DB2 as its master? Or do I 
have to delete the data first in DB3?



 You'll want to run DROP NODE against each node in Cluster2, (or if on 2.0+ 
 you can get away with just DROP SCHEMA _Cluster2 CASCADE) and stop the slons 
 for Cluster2.


Cascading, forwarding and all that is one of the core concepts of Slony. 
If you create one single setup looking like this:

 DB1 - DB2 - DB3

then you have a lot more flexibility and functionality than is obvious 
at first. Aside from being able to fail over to DB2.

Slony will let you MOVE the master role from DB1 to DB2. That 
operation will not just make DB2 the master, but at the same time DB1 
becomes a replica that doesn't need an initial sync, so your setup now 
would look like this:

 DB1 - DB2 - DB3

This is useful if you need to perform some maintenance on DB1. At this 
point DB2 is your master an you would just stop the slon process for 
DB1, do whatever you need to do, and start the slon process again. Once 
DB1 has caught up, you just transfer the master role back and everything 
is as it was.

What also works is that if you need to perform maintenance on DB2, it 
doesn't mean that DB3 has to fall behind too. You can easily change the 
data provider for DB3 to be DB1, so your configuration changes to

 DB1 - DB2
  |
  V
 DB3

At this point you stop the slon process for DB2, perform what you need 
to do on DB2, start the slon process and after DB2 has caught up, make 
it the data provider for DB3 again.


Regards,
Jan

-- 
Jan Wieck
Senior Software Engineer
http://slony.info
___
Slony1-general mailing list
Slony1-general@lists.slony.info
http://lists.slony.info/mailman/listinfo/slony1-general


Re: [Slony1-general] Error with Slony replication from another slony cluster slave

2014-10-09 Thread Jan Wieck
On 10/09/2014 06:34 PM, Dave Cramer wrote:
 Jan,

 But as you said they all have to be in the same cluster, correct ?

Yes.


Jan



 Dave Cramer

 On 9 October 2014 18:31, Jan Wieck j...@wi3ck.info
 mailto:j...@wi3ck.info wrote:

 On 10/09/2014 05:26 PM, Glyn Astill wrote:
   From: Granthana Biswas granthana.bis...@gmail.com
 mailto:granthana.bis...@gmail.com
  To: Glyn Astill glynast...@yahoo.co.uk
 mailto:glynast...@yahoo.co.uk
  Sent: Thursday, 9 October 2014, 16:34
  Subject: Re: [Slony1-general] Error with Slony replication from
 another slony cluster slave
  
  
  
  Hi Glyn,
  
  
  In my case I have two clusters:
  
  
  Cluster1 -  replicating from DB1 - DB2
  Cluster2 -  replicating from DB1 - DB3
  
  
  Can I stop Cluster2 and add DB3 to Cluster1 with DB2 as its
 master? Or do I have to delete the data first in DB3?
  
  
  
   You'll want to run DROP NODE against each node in Cluster2, (or
 if on 2.0+ you can get away with just DROP SCHEMA _Cluster2 CASCADE)
 and stop the slons for Cluster2.


 Cascading, forwarding and all that is one of the core concepts of Slony.
 If you create one single setup looking like this:

   DB1 - DB2 - DB3

 then you have a lot more flexibility and functionality than is obvious
 at first. Aside from being able to fail over to DB2.

 Slony will let you MOVE the master role from DB1 to DB2. That
 operation will not just make DB2 the master, but at the same time DB1
 becomes a replica that doesn't need an initial sync, so your setup now
 would look like this:

   DB1 - DB2 - DB3

 This is useful if you need to perform some maintenance on DB1. At this
 point DB2 is your master an you would just stop the slon process for
 DB1, do whatever you need to do, and start the slon process again. Once
 DB1 has caught up, you just transfer the master role back and everything
 is as it was.

 What also works is that if you need to perform maintenance on DB2, it
 doesn't mean that DB3 has to fall behind too. You can easily change the
 data provider for DB3 to be DB1, so your configuration changes to

   DB1 - DB2
|
V
   DB3

 At this point you stop the slon process for DB2, perform what you need
 to do on DB2, start the slon process and after DB2 has caught up, make
 it the data provider for DB3 again.


 Regards,
 Jan

 --
 Jan Wieck
 Senior Software Engineer
 http://slony.info
 ___
 Slony1-general mailing list
 Slony1-general@lists.slony.info mailto:Slony1-general@lists.slony.info
 http://lists.slony.info/mailman/listinfo/slony1-general




-- 
Jan Wieck
Senior Software Engineer
http://slony.info
___
Slony1-general mailing list
Slony1-general@lists.slony.info
http://lists.slony.info/mailman/listinfo/slony1-general


Re: [Slony1-general] Slony ignoring cleanup_interval?

2014-09-24 Thread Jan Wieck
On 09/24/2014 01:00 PM, Steve Singer wrote:
 On 09/24/2014 12:29 PM, Christopher Browne wrote:
 On Fri, Sep 19, 2014 at 1:50 PM, Kristopher kristopherwil...@gmail.com
 mailto:kristopherwil...@gmail.com wrote:

 I have the following setup in my conf file:

 cleanup_interval=5 seconds

 ...

 However, it only actually runs cleanup about every 10 minutes (the
 default):


 We have, in fact, three parameters controlling cleanup:

 a) cleanup_interval, expressed as a Postgres interval, that, according
 to the docs, indicates:

 gettext_noop(A PostgreSQL value compatible with ::interval 
 which indicates what aging interval should be used 
 for deleting old events, and hence for purging sl_log_* tables.),

 (see src/slon/confoptions.c for that; I expect that there's a slon
 option that will make it print out the documentation strings).

 b) SLON_CLEANUP_SLEEP, in src/slon/slon.h, which is hardcoded to 600,
 indicating that every 600 seconds, the cleanup thread is called

 c) SLON_VACUUM_FREQUENCY, also in src/slon/slon.h, hardcoded to 3,
 indicating how often cleanup thread should VACUUM tables.

 We haven't exposed SLON_CLEANUP_SLEEP as a configuration option, and, in
 effect, that's a value you'd want to shorten a lot during this process.

 It wouldn't be a great deal of trouble to expose SLON_CLEANUP_SLEEP, and
 it's probably somewhat handy to do so, particularly for situations such
 as what you describe, where we want to avidly empty out sl_log_*.

 I'll see about coming up with a patch, with a view to applying this to
 the various major releases.


 My preference would be that we have 1 parameter in the config for
 controlling how often the cleanup thread does it's stuff.

 Ie make SLON_CLEANUP_SLEEP be controlled by the existing
 cleanup_interval field in the config.   I don't understand why/when
 someone would want these two values to be different.

I think we introduced the cleanup_interval (very easily misunderstood as 
something different from what it actually does) as a safeguard against 
race conditions, where we feared to remove replication log too soon.

Unless we still fear such race condition, we should get rid of that 
parameter entirely and rely on confirmed by everyone is obsolete data 
and just purge it.

The thing that really matters in this context is the cleanup sleep time, 
which determines how often the cleanup is actually done. Calling it 
often enough compared to checkpoints can actually lead to a situation 
where sl_log heap and index data never gets written to disk. And that is 
a goal well worth aiming for.


Jan

-- 
Jan Wieck
Senior Software Engineer
http://slony.info
___
Slony1-general mailing list
Slony1-general@lists.slony.info
http://lists.slony.info/mailman/listinfo/slony1-general


Re: [Slony1-general] Slony ignoring cleanup_interval?

2014-09-19 Thread Jan Wieck
On 09/19/2014 04:55 PM, Steve Singer wrote:
 On 09/19/2014 01:50 PM, Kristopher wrote:
 I have the following setup in my conf file:

 cleanup_interval=5 seconds
 sync_interval=1000

 cluster_name='dbcluster01'
 conn_info='...'

 And, via the log, it looks like slony is picking up on the 5 second
 cleanup interval:

 2014-09-15 09:31:40 EDT CONFIG main: String option pid_file = [NULL]
 2014-09-15 09:31:40 EDT CONFIG main: String option archive_dir = [NULL]
 2014-09-15 09:31:40 EDT CONFIG main: String option sql_on_connection =
 [NULL]
 2014-09-15 09:31:40 EDT CONFIG main: String option lag_interval = [NULL]
 2014-09-15 09:31:40 EDT CONFIG main: String option command_on_logarchive
 = [NULL]
 2014-09-15 09:31:40 EDT CONFIG main: String option syslog_facility = LOCAL0
 2014-09-15 09:31:40 EDT CONFIG main: String option syslog_ident = slon
 2014-09-15 09:31:40 EDT CONFIG main: String option cleanup_interval = 5
 seconds

 However, it only actually runs cleanup about every 10 minutes (the default):


 We seem to be using
 #define SLON_CLEANUP_SLEEP600 /* sleep 10 minutes between 
 */

 as basis to a random time to sleep between calling the cleanup process.
We then pass the cleanup_interval from  config file to the stored
 function.

 Maybe Jan or Chris can comment if they remember how this was intended to
 work with a cleanup_interval that is specified in the config file.

There are two possible variables here. First the cleanup_interval that 
is passed to the cleanup stored procedure, controlling how old an event 
must be at minimum to be purged from sl_event. Second the frequency at 
which the cleanup stored procedure is invoked by the slon daemon.

All or none of this could currently be screwed up because of being 
miscommunicated between Chris and me.


Jan






 For purposes of your upgrade you want to make sure that sl_log_1 and
 sl_log_2 have no rows in them then you can upgrade.



 NOTICE:  Slony-I: log switch to sl_log_1 complete - truncate sl_log_2
 CONTEXT:  PL/pgSQL function _dbcluster01.cleanupevent(interval) line 94
 at assignment
 2014-09-18 06:28:40 EDT INFO   cleanupThread:0.368 seconds for
 cleanupEvent()
 NOTICE:  Slony-I: Logswitch to sl_log_2 initiated
 CONTEXT:  SQL statement SELECT _dbcluster01.logswitch_start()
 PL/pgSQL function _dbcluster01.cleanupevent(interval) line 96 at PERFORM
 2014-09-18 06:39:27 EDT INFO   cleanupThread:0.018 seconds for
 cleanupEvent()
 NOTICE:  Slony-I: log switch to sl_log_2 complete - truncate sl_log_1
 CONTEXT:  PL/pgSQL function _dbcluster01.cleanupevent(interval) line 94
 at assignment
 2014-09-18 06:51:10 EDT INFO   cleanupThread:0.106 seconds for
 cleanupEvent()
 2014-09-18 06:51:10 EDT INFO   cleanupThread:0.006 seconds for vacuuming
 NOTICE:  Slony-I: Logswitch to sl_log_1 initiated
 CONTEXT:  SQL statement SELECT _dbcluster01.logswitch_start()
 PL/pgSQL function _dbcluster01.cleanupevent(interval) line 96 at PERFORM
 2014-09-18 07:01:52 EDT INFO   cleanupThread:0.016 seconds for
 cleanupEvent()
 NOTICE:  Slony-I: log switch to sl_log_1 complete - truncate sl_log_2
 CONTEXT:  PL/pgSQL function _dbcluster01.cleanupevent(interval) line 94
 at assignment
 2014-09-18 07:13:33 EDT INFO   cleanupThread:0.110 seconds for
 cleanupEvent()
 NOTICE:  Slony-I: Logswitch to sl_log_2 initiated
 CONTEXT:  SQL statement SELECT _dbcluster01.logswitch_start()
 PL/pgSQL function _dbcluster01.cleanupevent(interval) line 96 at PERFORM
 2014-09-18 07:24:46 EDT INFO   cleanupThread:0.014 seconds for
 cleanupEvent()
 2014-09-18 07:24:46 EDT INFO   cleanupThread:0.006 seconds for vacuuming
 NOTICE:  Slony-I: log switch to sl_log_2 complete - truncate sl_log_1
 CONTEXT:  PL/pgSQL function _dbcluster01.cleanupevent(interval) line 94
 at assignment
 2014-09-18 07:36:17 EDT INFO   cleanupThread:0.114 seconds for
 cleanupEvent()
 NOTICE:  Slony-I: Logswitch to sl_log_1 initiated
 CONTEXT:  SQL statement SELECT _dbcluster01.logswitch_start()
 PL/pgSQL function _dbcluster01.cleanupevent(interval) line 96 at PERFORM
 2014-09-18 07:47:31 EDT INFO   cleanupThread:0.018 seconds for
 cleanupEvent()

 In order to perform the upgrade to the latest version, it says I need to
 set this interval low to clear up any data in the log tables, but I'm
 stuck trying to get that to happen.

 Any ideas?


 Kristopher



 ___
 Slony1-general mailing list
 Slony1-general@lists.slony.info
 http://lists.slony.info/mailman/listinfo/slony1-general


 ___
 Slony1-general mailing list
 Slony1-general@lists.slony.info
 http://lists.slony.info/mailman/listinfo/slony1-general



-- 
Jan Wieck
Senior Software Engineer
http://slony.info
___
Slony1-general mailing list
Slony1-general@lists.slony.info
http://lists.slony.info/mailman/listinfo/slony1-general


Re: [Slony1-general] Slony-I Upgrade

2014-08-25 Thread Jan Wieck
On 08/25/2014 02:58 AM, Venkata Balaji N wrote:

 Upgrading from 2.0.3 to 2.2.3 works fine.

Yes, there should be no difference between upgrading from 2.0 or 2.1 to 
2.2. The thing to keep in mind is that with 2.2 the structure of the 
sl_log tables changes and that it is absolutely necessary that the 
application is stopped and all subscribers are fully caught up before 
starting the upgrade process. Outstanding backlog is not being converted 
and those updates would be lost otherwise.


Regards,
Jan



 Thanks,
 VBN

 On Mon, Aug 25, 2014 at 11:56 AM, Soni M diptat...@gmail.com
 mailto:diptat...@gmail.com wrote:

 I never try the way You proposed. I think You better upgrade it with
 smaller steps, (i.e 2.0 to 2.1 then 2.1 to 2.2) as described here :
 http://slony.info/documentation/2.2/slonyupgrade.html


 On Sat, Aug 23, 2014 at 4:30 PM, Venkata Balaji N nag1...@gmail.com
 mailto:nag1...@gmail.com wrote:

 Hello All,

 We are upgrading our existing Slony-I installations across our
 production servers.

 We are upgrading from version 2.0.3 to version 2.2.3.

 We will have our Applications down for an other scheduled activity

 Below is our upgrade procedure -

   * Install Slony-I-2.2.3 on all the nodes
   * Issue SLONIK_UPDATE_NODES -c config file  from any of the
 nodes (we do it from our master node)
   * Start the slon processes for all the nodes

 Will this be OK to upgrade from 2.0.3 to 2.2.3 ?

 Please help us know if we have to do anything else.

 Regards,
 VBN



 ___
 Slony1-general mailing list
 Slony1-general@lists.slony.info
 mailto:Slony1-general@lists.slony.info
 http://lists.slony.info/mailman/listinfo/slony1-general




 --
 Regards,

 Soni Maula Harriz




 ___
 Slony1-general mailing list
 Slony1-general@lists.slony.info
 http://lists.slony.info/mailman/listinfo/slony1-general



-- 
Jan Wieck
Senior Software Engineer
http://slony.info
___
Slony1-general mailing list
Slony1-general@lists.slony.info
http://lists.slony.info/mailman/listinfo/slony1-general


Re: [Slony1-general] slony 2.2.3 experiences

2014-07-17 Thread Jan Wieck
On 07/17/14 17:45, Ger Timmens wrote:
 Hi all,

 After upgrading our environment from slony 2.1.4 to slony 2.2.3
 we see 'a lot' more slony connections both on master, forwarders
 as subscribers.

 E.g. on the master we see approx 120 more 'idle' slony connections.
 This connections start around 50, groing to 120 over time
 (restarting slons, will bring it down and connections will increase again).

The number of connections should not go up like that.

How does your cluster configuration look like? 50 connection to start 
with sounds already high.

Are there any error messages in the slon logs? What I can imagine is a 
connection leak in some unusual error situation. Slon reconnects but 
never closes the old connection.




 This is probably related to the changed failover logic in slony 2.2.x
 over 2.1.x where each forwarding note should know about each
 other node ?

Not sure about that yet.


Regards,
Jan

-- 
Jan Wieck
Senior Software Engineer
http://slony.info
___
Slony1-general mailing list
Slony1-general@lists.slony.info
http://lists.slony.info/mailman/listinfo/slony1-general


Re: [Slony1-general] sl_log_x very large due to lagging subscription

2014-06-27 Thread Jan Wieck
I guess you mean either UNSUBSCRIBE SET or DROP NODE.

What should happen in the UNSUBSCRIBE case is that the node quickly
confirms all the outstanding events since they no longer require
replicating any data, at which point all providers can perform/finish log
switching again.

In the case of DROP NODE the behavior is similar. Make sure that ALL
remaining nodes in the cluster have forgotten everything about the dropped
node before attempting to rebuild it. Or rebuild it with a different node
ID.

If this is your only node, then you could as well uninstall the whole
cluster and start from scratch.


--
Jan Wieck
Senior Software Engineer
http://slony.info
On Jun 27, 2014 4:04 PM, Dave Cramer davecra...@gmail.com wrote:

 If I drop the subscription through drop set will the log file be
 automagically cleaned up ?

 If not how can I fix it ?

 Dave Cramer

 ___
 Slony1-general mailing list
 Slony1-general@lists.slony.info
 http://lists.slony.info/mailman/listinfo/slony1-general


___
Slony1-general mailing list
Slony1-general@lists.slony.info
http://lists.slony.info/mailman/listinfo/slony1-general


Re: [Slony1-general] sl_log_x very large due to lagging subscription

2014-06-27 Thread Jan Wieck
On Jun 27, 2014 7:44 PM, Dave Cramer davecra...@gmail.com wrote:


 On 27 June 2014 16:55, Jan Wieck j...@wi3ck.info wrote:

 I guess you mean either UNSUBSCRIBE SET or DROP NODE.

 What should happen in the UNSUBSCRIBE case is that the node quickly
confirms all the outstanding events since they no longer require
replicating any data, at which point all providers can perform/finish log
switching again.

 In the case of DROP NODE the behavior is similar. Make sure that ALL
remaining nodes in the cluster have forgotten everything about the dropped
node before attempting to rebuild it. Or rebuild it with a different node
ID.

 If this is your only node, then you could as well uninstall the whole
cluster and start from scratch.


 Yes I did mean UNSUBSCRIBE SET. And I forgot to mention this is slony 1.2

I forgive you.


 Also as is usually the case it is not as simple as I have first posed.
For reasons that I won't elaborate the lagging subscription is a duplicate
of one that is not lagging.

Since you (apparently) already gave up on that lagging node, dropping it
should work just as well, or even better. That said ...


 One of the sl_log_x files is 16G in size.

 Suggestions are welcome

... would it be possible to take a brief application outage of 10 minutes
or so?

If yes, then you could stop the application and wait for the healthy node
to catch up 100% (slonik script doing a SYNC and WAIT FOR EVENT). Now you
stop all slon processes and remove Slony via UNINSTALL NODE commands.
Install Slony 2.2 and rebuild the cluster. The formerly healthy node is
getting subscribed with OMIT COPY, so it doesn't copy any data. Since you
now have a replica you can start the application and let the third node
rebuild with the regular copy.

I know that this doesn't sound too cozy at first, but we have done this
UNINSTALL and rebuild with OMIT COPY several times to our production
environment (for reasons I now won't elaborate on). Bottom line is that it
works reliably.

We can work through this in more detail on the phone and through other
media if you like.

Jan


 Dave Cramer

 www.credativ.ca


 On Jun 27, 2014 4:04 PM, Dave Cramer davecra...@gmail.com wrote:

 If I drop the subscription through drop set will the log file be
automagically cleaned up ?

 If not how can I fix it ?

 Dave Cramer

 ___
 Slony1-general mailing list
 Slony1-general@lists.slony.info
 http://lists.slony.info/mailman/listinfo/slony1-general



--
Jan Wieck
Senior Software Engineer
http://slony.info
___
Slony1-general mailing list
Slony1-general@lists.slony.info
http://lists.slony.info/mailman/listinfo/slony1-general


Re: [Slony1-general] upgrading from 1.x to 2.x

2014-06-18 Thread Jan Wieck
On 06/18/14 15:59, Dave Cramer wrote:
 What does this ominous message mean

 When Slony-I uninstalls itself, catalog corruptions are fixed back up.

Slony 1.x was designed to run on PostgreSQL versions prior to 8.3. Since 
the session_replication_role and trigger configuration features were 
new in 8.3, Slony 1.x disabled triggers and rewrite rules by hacking 
pg_class on replicas. When you use UNINSTALL NODE, those hacks are 
reverted and you end up with a clean system catalog again on that replica.


Jan

-- 
Jan Wieck
Senior Software Engineer
http://slony.info
___
Slony1-general mailing list
Slony1-general@lists.slony.info
http://lists.slony.info/mailman/listinfo/slony1-general


Re: [Slony1-general] Replication fails on Corrupted PK index

2014-04-28 Thread Jan Wieck
On 04/28/14 08:01, Steve Singer wrote:
 On 04/28/2014 05:19 AM, Raghav wrote:
 postgres=# insert into dtest values (1,'D');
 INSERT 0 1
 postgres=# insert into dtest values (1,'D');

 But your indexes will also prevent something like:

 * Stop your slons

insert into dtest values (1,'D');
delete from dtest;
insert into dtest values (1,'D');

 which should be allowed.

That is problem number 1).

2) It will still allow something like

 insert into dtest values (1, 'D', 'foo');
 insert into dtest values (1, 'D', 'bar');

since the index isn't restricted to the primary key columns of the
table. And since it is shared by all user tables, that's impossible.

3) It will also not prevent something like this:

 insert into dtest values (1, 'D');
 -- wait for 30 minutes in which slony cleanup switches logs and
 -- the index corruption happens
 insert into dtest values (1, 'D');

In summary, attempting to guard against possible PostgreSQL bugs that 
corrupt data is futile. Even if we could figure this one out, the next 
problem would be to guard against a corruption of the sl_log_N tables.


Regards,
Jan


-- 
Jan Wieck
Senior Software Engineer
http://slony.info
___
Slony1-general mailing list
Slony1-general@lists.slony.info
http://lists.slony.info/mailman/listinfo/slony1-general


Re: [Slony1-general] Slony setup error

2014-04-17 Thread Jan Wieck
On 04/17/14 05:20, Glyn Astill wrote:
 Well have you checked to see if the _replication schema and getlocalnodeid 
 function are present on your new subscriber?

  From the error message I can only assume you've not run STORE NODE against 
 the node you're trying to subscribe,

That is what is most likely.

 p.s. I'll use this as a test to see if I can still post to the list from my 
 yahoo.com address too; I've not much hope it will :-(

Did @yahoo.co.uk receive the same ill advised changes? Not sure because 
so far I don't see any bounce messages.


Jan

-- 
Jan Wieck
Senior Software Engineer
http://slony.info
___
Slony1-general mailing list
Slony1-general@lists.slony.info
http://lists.slony.info/mailman/listinfo/slony1-general


Re: [Slony1-general] Slony setup error

2014-04-17 Thread Jan Wieck
On 04/17/14 12:50, Samir Parikh wrote:
 Thanks for the quick response. Yes, you are right function getLocalNodeId 
 does not exist on subscriber node but it does exists on primary node. On 
 further check I could see on primary node under replication schema total 127 
 functions but in subscriber node I could see only 112. Not sure why there are 
 missing functions which include getLocalNodeId.

 This is the very first time I am trying to set up slony replication. I am 
 following documentation on slony.info.

That sounds very odd.

I would suggest doing an UNINSTALL NODE, DROP NODE (both for the 
non-functioning one), waiting until all traces of the dropped node are 
gone from the system (or using a different node ID to be sure) and then 
start over from STORE NODE, STORE PATH all the way.

And make sure you use 2.2 or at least 2.1. 2.0 is marked end of life and 
earlier versions have not received any updates/fixes for quite a while now.


Regards,
Jan



 Thanks again.
 Samir

 -Original Message-
 From: Glyn Astill [mailto:glynast...@yahoo.co.uk]
 Sent: Thursday, April 17, 2014 2:20 AM
 To: Samir Parikh; slony1-general@lists.slony.info
 Subject: Re: [Slony1-general] Slony setup error

 From: Samir Parikh spar...@carcharging.com

To: slony1-general@lists.slony.info slony1-general@lists.slony.info
Sent: Thursday, 17 April 2014, 2:06
Subject: [Slony1-general] Slony setup error


I am getting following error while trying to set up replication ? Can 
somebody help ?

./slonik_subscribe_set 1 2 | ./slonik
stdin:5: PGRES_FATAL_ERROR select max(ev_seqno) FROM
_replication.sl_event , _replication.sl_node  where 
ev_origin=_replication.getLocalNodeId('_replication')  AND ev_type  
'SYNC'  AND sl_node.no_id= ev_origin - ERROR:  function 
_replication.getlocalnodeid(unknown) does not exist LINE 1: ...l_event , 
_replication.sl_node  where ev_origin=_replicat...
 ^
HINT:  No function matches the given name and argument types. You might need 
to add explicit type casts.
error: unable to query event history on node 2


 Well have you checked to see if the _replication schema and getlocalnodeid 
 function are present on your new subscriber?

  From the error message I can only assume you've not run STORE NODE against 
 the node you're trying to subscribe,


waiting for events  (2,221473472232) only at (2,0) to be confirmed on
node 1


 This reinforces my thoughts, as well as running STORE NODE/PATH for your new 
 subscriber you need to make sure you have a slon running against it too.

 Perhaps you could fill us in on the steps you've taken so far so we can see 
 where you've gone off course.


 Glyn.

 p.s. I'll use this as a test to see if I can still post to the list from my 
 yahoo.com address too; I've not much hope it will :-(

 ___
 Slony1-general mailing list
 Slony1-general@lists.slony.info
 http://lists.slony.info/mailman/listinfo/slony1-general



-- 
Jan Wieck
Senior Software Engineer
http://slony.info
___
Slony1-general mailing list
Slony1-general@lists.slony.info
http://lists.slony.info/mailman/listinfo/slony1-general


Re: [Slony1-general] Slony setup error

2014-04-17 Thread Jan Wieck
On 04/17/14 14:51, Samir Parikh wrote:
 Some success. I  could now set up the replication after dropping/recreating 
 the replication schema on both nodes.
 Now I am trying to add tables to set using slonik script and stuck with this 
 syntax error. I googled it and found many instances of this error but could 
 not find any solution that work for me.

 stdin:1: ERROR: syntax error at or near replication
 [root@vmbndbdev01 ~]# slonik _EOF_
 cluster name = 'replication'
 _EOF_
 stdin:1: ERROR: syntax error at or near replication

The bison parser for this isn't the best in the world.

It expects at least one admin conninfo and one actual command too. 
Otherwise it is not considered a complete script and in bison syntax 
error is pretty much all you get.


Jan



 I tried with single,double and no quotes, but not luck with either of those.

 Thanks much.
 Samir




 -Original Message-
 From: Jan Wieck [mailto:j...@wi3ck.info]
 Sent: Thursday, April 17, 2014 10:30 AM
 To: Samir Parikh; Glyn Astill; slony1-general@lists.slony.info
 Subject: Re: [Slony1-general] Slony setup error

 On 04/17/14 12:50, Samir Parikh wrote:
 Thanks for the quick response. Yes, you are right function getLocalNodeId 
 does not exist on subscriber node but it does exists on primary node. On 
 further check I could see on primary node under replication schema total 127 
 functions but in subscriber node I could see only 112. Not sure why there 
 are missing functions which include getLocalNodeId.

 This is the very first time I am trying to set up slony replication. I am 
 following documentation on slony.info.

 That sounds very odd.

 I would suggest doing an UNINSTALL NODE, DROP NODE (both for the 
 non-functioning one), waiting until all traces of the dropped node are gone 
 from the system (or using a different node ID to be sure) and then start over 
 from STORE NODE, STORE PATH all the way.

 And make sure you use 2.2 or at least 2.1. 2.0 is marked end of life and 
 earlier versions have not received any updates/fixes for quite a while now.


 Regards,
 Jan



 Thanks again.
 Samir

 -Original Message-
 From: Glyn Astill [mailto:glynast...@yahoo.co.uk]
 Sent: Thursday, April 17, 2014 2:20 AM
 To: Samir Parikh; slony1-general@lists.slony.info
 Subject: Re: [Slony1-general] Slony setup error

 From: Samir Parikh spar...@carcharging.com

To: slony1-general@lists.slony.info slony1-general@lists.slony.info
Sent: Thursday, 17 April 2014, 2:06
Subject: [Slony1-general] Slony setup error


I am getting following error while trying to set up replication ? Can 
somebody help ?

./slonik_subscribe_set 1 2 | ./slonik
stdin:5: PGRES_FATAL_ERROR select max(ev_seqno) FROM
_replication.sl_event , _replication.sl_node  where 
ev_origin=_replication.getLocalNodeId('_replication')  AND ev_type  
'SYNC'  AND sl_node.no_id= ev_origin - ERROR:  function 
_replication.getlocalnodeid(unknown) does not exist LINE 1: ...l_event , 
_replication.sl_node  where ev_origin=_replicat...
 ^
HINT:  No function matches the given name and argument types. You might need 
to add explicit type casts.
error: unable to query event history on node 2


 Well have you checked to see if the _replication schema and getlocalnodeid 
 function are present on your new subscriber?

  From the error message I can only assume you've not run STORE NODE 
 against the node you're trying to subscribe,


waiting for events  (2,221473472232) only at (2,0) to be confirmed on
node 1


 This reinforces my thoughts, as well as running STORE NODE/PATH for your new 
 subscriber you need to make sure you have a slon running against it too.

 Perhaps you could fill us in on the steps you've taken so far so we can see 
 where you've gone off course.


 Glyn.

 p.s. I'll use this as a test to see if I can still post to the list from my 
 yahoo.com address too; I've not much hope it will :-(

 ___
 Slony1-general mailing list
 Slony1-general@lists.slony.info
 http://lists.slony.info/mailman/listinfo/slony1-general





-- 
Jan Wieck
Senior Software Engineer
http://slony.info
___
Slony1-general mailing list
Slony1-general@lists.slony.info
http://lists.slony.info/mailman/listinfo/slony1-general


Re: [Slony1-general] Slony setup error

2014-04-17 Thread Jan Wieck
On 04/17/14 14:56, Jan Wieck wrote:
 On 04/17/14 14:51, Samir Parikh wrote:
 Some success. I  could now set up the replication after dropping/recreating 
 the replication schema on both nodes.
 Now I am trying to add tables to set using slonik script and stuck with this 
 syntax error. I googled it and found many instances of this error but could 
 not find any solution that work for me.

 stdin:1: ERROR: syntax error at or near replication
 [root@vmbndbdev01 ~]# slonik _EOF_
 cluster name = 'replication'
 _EOF_
 stdin:1: ERROR: syntax error at or near replication

 The bison parser for this isn't the best in the world.

 It expects at least one admin conninfo and one actual command too.
 Otherwise it is not considered a complete script and in bison syntax
 error is pretty much all you get.

Plus you need a semicolon at the end of course.


Jan



 Jan



 I tried with single,double and no quotes, but not luck with either of those.

 Thanks much.
 Samir




 -Original Message-
 From: Jan Wieck [mailto:j...@wi3ck.info]
 Sent: Thursday, April 17, 2014 10:30 AM
 To: Samir Parikh; Glyn Astill; slony1-general@lists.slony.info
 Subject: Re: [Slony1-general] Slony setup error

 On 04/17/14 12:50, Samir Parikh wrote:
 Thanks for the quick response. Yes, you are right function getLocalNodeId 
 does not exist on subscriber node but it does exists on primary node. On 
 further check I could see on primary node under replication schema total 
 127 functions but in subscriber node I could see only 112. Not sure why 
 there are missing functions which include getLocalNodeId.

 This is the very first time I am trying to set up slony replication. I am 
 following documentation on slony.info.

 That sounds very odd.

 I would suggest doing an UNINSTALL NODE, DROP NODE (both for the 
 non-functioning one), waiting until all traces of the dropped node are gone 
 from the system (or using a different node ID to be sure) and then start 
 over from STORE NODE, STORE PATH all the way.

 And make sure you use 2.2 or at least 2.1. 2.0 is marked end of life and 
 earlier versions have not received any updates/fixes for quite a while now.


 Regards,
 Jan



 Thanks again.
 Samir

 -Original Message-
 From: Glyn Astill [mailto:glynast...@yahoo.co.uk]
 Sent: Thursday, April 17, 2014 2:20 AM
 To: Samir Parikh; slony1-general@lists.slony.info
 Subject: Re: [Slony1-general] Slony setup error

 From: Samir Parikh spar...@carcharging.com

To: slony1-general@lists.slony.info slony1-general@lists.slony.info
Sent: Thursday, 17 April 2014, 2:06
Subject: [Slony1-general] Slony setup error


I am getting following error while trying to set up replication ? Can 
somebody help ?

./slonik_subscribe_set 1 2 | ./slonik
stdin:5: PGRES_FATAL_ERROR select max(ev_seqno) FROM
_replication.sl_event , _replication.sl_node  where 
ev_origin=_replication.getLocalNodeId('_replication')  AND ev_type  
'SYNC'  AND sl_node.no_id= ev_origin - ERROR:  function 
_replication.getlocalnodeid(unknown) does not exist LINE 1: ...l_event , 
_replication.sl_node  where ev_origin=_replicat...
 ^
HINT:  No function matches the given name and argument types. You might 
need to add explicit type casts.
error: unable to query event history on node 2


 Well have you checked to see if the _replication schema and 
 getlocalnodeid function are present on your new subscriber?

  From the error message I can only assume you've not run STORE NODE 
 against the node you're trying to subscribe,


waiting for events  (2,221473472232) only at (2,0) to be confirmed on
node 1


 This reinforces my thoughts, as well as running STORE NODE/PATH for your 
 new subscriber you need to make sure you have a slon running against it too.

 Perhaps you could fill us in on the steps you've taken so far so we can see 
 where you've gone off course.


 Glyn.

 p.s. I'll use this as a test to see if I can still post to the list from my 
 yahoo.com address too; I've not much hope it will :-(

 ___
 Slony1-general mailing list
 Slony1-general@lists.slony.info
 http://lists.slony.info/mailman/listinfo/slony1-general







-- 
Jan Wieck
Senior Software Engineer
http://slony.info
___
Slony1-general mailing list
Slony1-general@lists.slony.info
http://lists.slony.info/mailman/listinfo/slony1-general


Re: [Slony1-general] Wide Area Replication, Add Set, without index?

2014-04-17 Thread Jan Wieck
On 02/14/14 15:26, Tory M Blue wrote:



 On Fri, Feb 14, 2014 at 10:35 AM, Vick Khera vi...@khera.org
 mailto:vi...@khera.org wrote:


 On Thu, Feb 13, 2014 at 5:15 PM, Tory M Blue tmb...@gmail.com
 mailto:tmb...@gmail.com wrote:

 so I'm wondering how do I change my scripts to create the
 index's at the end? I obviously can't drop the index on the
 master or everything will go to heck in a hand basket, but is
 there an instruction set when I'm adding a node and adding sets
 to that node, to tell it to ignore the indexes?


 I'm pretty sure slony disables indexes while copying, then re-builds
 them after it is done.



 Actually I believe this is beyond slony. The schema tells postgres what
 to do, so if it's creating indexes when the table is created, seems like
 everything pauses / waits for the index to be done, before moving on to
 the next table copy.

 So I think editing the schema to do the indexes at the end or not at all
 (make it manual), one should be able to use that schema on a remote slon
 host, that host will not pause to create indexes, and allow slony to
 push the data without a significant pause (some of our indexes have
 taken 4+ hours to create).

If you do a SUBSCRIBE SET without specifying OMIT COPY = yes, then Slony 
is doing:

 truncate the table
 turn off indexes (mucking with system catalog here)
 copy table
 turn indexes back on
 reindex table

All this is done within one transaction for all tables in the set, so in 
this case mucking with system catalog is safe, because it will never 
be visible to any concurrent session.

You could drop all indexes (and the primary key) from all the tables on 
the subscriber, do the subscribe, then kill the slon (because without 
the PK it won't be able to do much in terms of catching up), recreate PK 
and indexes, then let it catch up.

What I would rather suggest is that you enable TCP keepalives and 
configure them in the slon config so that your firewall/NAT-gateway or 
whatever is reaping the idle connections is kept from doing so.


 I'm going to test this anyways, I don't see anything particular to slon
 to stop this behaviour

TCP keepalives should do that, as a side effect.


Regards,
Jan

-- 
Jan Wieck
Senior Software Engineer
http://slony.info
___
Slony1-general mailing list
Slony1-general@lists.slony.info
http://lists.slony.info/mailman/listinfo/slony1-general


Re: [Slony1-general] Administrivia: DMARC, Yahoo

2014-04-16 Thread Jan Wieck
On 04/15/14 11:58, Andrew Sullivan wrote:
 On Tue, Apr 15, 2014 at 10:57:37AM -0400, Christopher Browne wrote:
 Here is an announcement of a block Yahoo users policy just implemented by
 another free software project (
 http://lists.roaringpenguin.com/pipermail/remind-fans/2014/003025.html)
 which seems to sum the issue up briefly but accurately.

 One alternative to this is to rewrite the From: address of Yahoo users
 when they enter the list, adding .invalid to the end of the mail
 address.  The resulting lookup for the DMARC record then looks in
 (say) yahoo.com.invalid.  You can't respond to such users, of course,
 and it may cause a lot of bounces, but fortunately .invalid is
 guaranteed not to work anywhere so at least the DNS lookups will be
 quick.

I am not in favor of shifting bounces around by deliberately creating 
invalid header content in order to make posting from Yahoo! addresses 
possible again. I am in favor of following exactly the same route and 
block all mail from @yahoo.com (and other DMARC participating ISPs) on 
our lists.

 I do encourage people who are using yahoo and who want to use mailing
 lists to ditch the yahoo account, however.  I also urge everyone to
 avoid yahoo as much as possible, because I think this behaviour was
 irresponsible and hostile to the Internet, and deserves to be shunned.

Ditto.

One aspect of the whole thing is that the DMARC proposal is two years 
old and it was well known that this (breaking mailing lists) would be a 
side effect of it. The powers that be at Yahoo! went ahead with it 
anyways. This can mean only one thing. That Yahoo! does not want users 
with a @yahoo.com address to participate in third party mailing lists. 
If that is what they want, then that is what they should get.


Regards,
Jan

-- 
Jan Wieck
Senior Software Engineer
http://slony.info
___
Slony1-general mailing list
Slony1-general@lists.slony.info
http://lists.slony.info/mailman/listinfo/slony1-general


[Slony1-general] Yahoo DMARC test

2014-04-10 Thread Jan Wieck
Let's see

-- 
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin
___
Slony1-general mailing list
Slony1-general@lists.slony.info
http://lists.slony.info/mailman/listinfo/slony1-general


Re: [Slony1-general] Still having issues with wide area replication. large table , copy set 2 failed

2014-02-18 Thread Jan Wieck
On 02/16/14 20:06, Jeff Frost wrote:
 
 On Feb 16, 2014, at 5:00 PM, Tory M Blue tmb...@gmail.com wrote:
 
 
 
 
 As can be seen the connection is reaped, slon/postgres continue on their 
 way, it's not until the next data copy is required that it finds it's 
 connection is no longer there. Why it can't recreate a conneciton as one 
 would do if they stopped and started slon is kind of beyond me.  Just not 
 100% sure where it's being killed.
 
 
 Because the initial sync must be done as a single transaction.
 
 
 
 2014-02-16 16:40:46 PST CONFIG remoteWorkerThread_1: 7183.069 seconds to 
 copy table tracking.spotlightimp
 2014-02-16 16:40:46 PST CONFIG remoteWorkerThread_1: copy table 
 tracking.adimp
 2014-02-16 16:40:46 PST CONFIG remoteWorkerThread_1: Begin COPY of table 
 tracking.adimp
 2014-02-16 16:40:46 PST ERROR  remoteWorkerThread_1: select 
 _cls.copyFields(19); 
 2014-02-16 16:40:46 PST WARN   remoteWorkerThread_1: data copy for set 2 
 failed 1 times - sleep 15 seconds
 NOTICE:  Slony-I: Logswitch to sl_log_2 initiated
 CONTEXT:  SQL statement SELECT _cls.logswitch_start()
 PL/pgSQL function _cls.cleanupevent(interval) line 96 at PERFORM
 2014-02-16 16:40:49 PST INFO   cleanupThread: 6541.365 seconds for 
 cleanupEvent()
  
 
 Am I doing this wrong? figured that since I've seen connections at 15 
 minutes of processing complete fine, I thought that 30 minutes is more then 
 enough. So send the first hey are you still there at 15 minutes then 
 continue with them every 5 minutes, for a count of 30.
 
 But the above seems to have been reaped in the 20 minute area..
 
 net.ipv4.tcp_keepalive_time = 600
 net.ipv4.tcp_keepalive_probes = 30
 net.ipv4.tcp_keepalive_intvl = 300
 
 
 Set it so that it's sending keepalives every 30 seconds.
 
 Something like this:
 
 net.ipv4.tcp_keepalive_time = 30
 net.ipv4.tcp_keepalive_probes = 10
 net.ipv4.tcp_keepalive_intvl = 30
 

Jeff is right.

Really understanding these values may help too.

Since Slony allows to set them specifically for Slony in its config
file, that is where it really should be done, rather than setting the
global values in the kernel. Those kernel values should be adjusted to
more appropriate values than suitable for a link to the Moon too, but
that's another story.

tcp_keepalive_time is the number of seconds, the kernel waits since the
last transmission on a socket, before starting to probe. In this case,
the end of the COPY, when slony is going into the long IDLE in
transaction while building the indexes, is the start of that timer.

tcp_keepalive_intvl is the interval between keepalive packets. But that
interval only kicks in once the tcp_keepalive_time has elapsed.

tcp_keepalive_probes is the number of keepalive packets, that need to be
missing in a row, before the connection is considered lost, resulting
in a connection reset by peer. One single keepalive received resets
the whole thing.

I actually go a lot more aggressive at this. Something like

tcp_keepalive_time = 5
tcp_keepalive_probes = 24
tcp_keepalive_intvl = 5

Pretty much all remote slony connections do something every couple of
seconds. And a link that cannot deal with a few keepalive packets per
minute is not suitable for running slony over anyway. And who cares
about wasting bandwidth on an idle link? Seriously! It's not like we are
paying by the kilobyte of used bandwidth these days, are we?

What is also important here that my above settings lead to a timeout and
canceling of the PostgreSQL backend within 2 minutes. There is something
really bad about hanging backends of lost Slony connections, when you
actually ever need to failover. You really want them to time out in a
timely fashion. Trust me.


Regards,
Jan

-- 
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin
___
Slony1-general mailing list
Slony1-general@lists.slony.info
http://lists.slony.info/mailman/listinfo/slony1-general


Re: [Slony1-general] Sync stopped with slony-2.2.1 on data type mismatch

2014-01-29 Thread Jan Wieck
On 01/29/14 17:26, Hanselman, Matthew wrote:
 One follow-up: Do you think 2.1.4 is stable in this regard?

I have actually found the bug in 2.2.STABLE that causes this. As Steve
assumed it is a memory corruption inside the log apply trigger.

2.1.x does not have this bug since it doesn't have an apply trigger at
all, but issues individual SQL statements against the updated tables
directly.

The bug can be avoided by setting the apply_cache_size in the slon
config file higher than the number of possible individual apply query
plans during a single SYNC event. This may bloat the memory size of the
backend, used by slon to apply changes, but better that than anything
else. Note that the apply cache size defaults to 50, which is a VERY low
number. Using 1,000 or even 10,000 should not cause a modern DB server
to run out of memory. There can only ever be one DB connection per node
that creates these plans (the local slon's connection). You should be
good bumping that number instead of downgrading to 2.1.

We will have a Slony 2.2.2 release out really soon.

Sorry for the inconvenience.


Regards,
Jan

 
 Thanks!
 
 - Matt
 
 
 -Original Message-
 From: Jan Wieck [mailto:janwi...@yahoo.com] 
 Sent: Friday, January 24, 2014 11:36 AM
 To: Hanselman, Matthew; Steve Singer
 Cc: slony1-general@lists.slony.info
 Subject: Re: [Slony1-general] Sync stopped with slony-2.2.1 on data type 
 mismatch
 
 On 01/24/14 10:03, Hanselman, Matthew wrote:
 This looks very similar. I'll leave a comment to this effect on the bug. 
 Thanks!
 
 For what it's worth, I tried restarting the slon daemon on both the master  
 slave. I got a very similar error, but this time it was complaining that 
 Lost Angeles wasn't an integer.
 
 That does indeed look like memory corruption. I will take a very close look 
 at the apply trigger code.
 
 In the meantime it may mask this error to increase the size of the apply 
 query cache in the slon conf.
 
 
 Thanks,
 Jan
 
 
 
 - Matt
 
 
 -Original Message-
 From: Steve Singer [mailto:ssin...@ca.afilias.info]
 Sent: Friday, January 24, 2014 9:59 AM
 To: Jan Wieck
 Cc: Hanselman, Matthew; slony1-general@lists.slony.info
 Subject: Re: [Slony1-general] Sync stopped with slony-2.2.1 on data 
 type mismatch
 
 On 01/23/2014 03:42 PM, Jan Wieck wrote:
 On 01/23/14 15:31, Hanselman, Matthew wrote:
 Running Slony 2.2.1 with Postgres 9.2.6. The slave is getting errors 
 trying to apply a change with this error:



 2014-01-23 14:55:58 EST ERROR  remoteWorkerThread_1_1: error at end 
 of COPY IN: ERROR:  invalid input syntax for integer: 33.77269118

 
 
 Could this be the same thing as that is causing
 http://bugs.slony.info/bugzilla/show_bug.cgi?id=327
 
 (We suspect memory corruption in the apply trigger)
 
 
 
 


 The column in question is a double precision on both master  slave.
 No schema change has happened since Slony was deployed.

 What did happen between slony was deployed and this breakage?


 Jan






 Is this a bug in Slony, or is there something else I can do to 
 troubleshoot?



 - Matt





 ___
 Slony1-general mailing list
 Slony1-general@lists.slony.info
 http://lists.slony.info/mailman/listinfo/slony1-general



 
 
 
 --
 Anyone who trades liberty for security deserves neither liberty nor security. 
 -- Benjamin Franklin
 


-- 
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin
___
Slony1-general mailing list
Slony1-general@lists.slony.info
http://lists.slony.info/mailman/listinfo/slony1-general


Re: [Slony1-general] Sync stopped with slony-2.2.1 on data type mismatch

2014-01-24 Thread Jan Wieck
On 01/24/14 10:03, Hanselman, Matthew wrote:
 This looks very similar. I'll leave a comment to this effect on the bug. 
 Thanks!
 
 For what it's worth, I tried restarting the slon daemon on both the master  
 slave. I got a very similar error, but this time it was complaining that 
 Lost Angeles wasn't an integer.

That does indeed look like memory corruption. I will take a very close
look at the apply trigger code.

In the meantime it may mask this error to increase the size of the apply
query cache in the slon conf.


Thanks,
Jan


 
 - Matt
 
 
 -Original Message-
 From: Steve Singer [mailto:ssin...@ca.afilias.info] 
 Sent: Friday, January 24, 2014 9:59 AM
 To: Jan Wieck
 Cc: Hanselman, Matthew; slony1-general@lists.slony.info
 Subject: Re: [Slony1-general] Sync stopped with slony-2.2.1 on data type 
 mismatch
 
 On 01/23/2014 03:42 PM, Jan Wieck wrote:
 On 01/23/14 15:31, Hanselman, Matthew wrote:
 Running Slony 2.2.1 with Postgres 9.2.6. The slave is getting errors 
 trying to apply a change with this error:



 2014-01-23 14:55:58 EST ERROR  remoteWorkerThread_1_1: error at end 
 of COPY IN: ERROR:  invalid input syntax for integer: 33.77269118

 
 
 Could this be the same thing as that is causing
 http://bugs.slony.info/bugzilla/show_bug.cgi?id=327
 
 (We suspect memory corruption in the apply trigger)
 
 
 
 


 The column in question is a double precision on both master  slave.
 No schema change has happened since Slony was deployed.

 What did happen between slony was deployed and this breakage?


 Jan






 Is this a bug in Slony, or is there something else I can do to troubleshoot?



 - Matt





 ___
 Slony1-general mailing list
 Slony1-general@lists.slony.info
 http://lists.slony.info/mailman/listinfo/slony1-general



 


-- 
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin
___
Slony1-general mailing list
Slony1-general@lists.slony.info
http://lists.slony.info/mailman/listinfo/slony1-general


Re: [Slony1-general] Sync stopped with slony-2.2.1 on data type mismatch

2014-01-23 Thread Jan Wieck
On 01/23/14 15:31, Hanselman, Matthew wrote:
 Running Slony 2.2.1 with Postgres 9.2.6. The slave is getting errors
 trying to apply a change with this error:
 
  
 
 2014-01-23 14:55:58 EST ERROR  remoteWorkerThread_1_1: error at end of
 COPY IN: ERROR:  invalid input syntax for integer: 33.77269118
 
  
 
 The column in question is a “double precision” on both master  slave.
 No schema change has happened since Slony was deployed.

What did happen between slony was deployed and this breakage?


Jan



 
  
 
 Is this a bug in Slony, or is there something else I can do to troubleshoot?
 
  
 
 - Matt
 
  
 
 
 
 ___
 Slony1-general mailing list
 Slony1-general@lists.slony.info
 http://lists.slony.info/mailman/listinfo/slony1-general
 


-- 
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin
___
Slony1-general mailing list
Slony1-general@lists.slony.info
http://lists.slony.info/mailman/listinfo/slony1-general


Re: [Slony1-general] Slony getting 'hung up' on an event?

2014-01-06 Thread Jan Wieck
On 01/06/14 13:51, Brian Fehrle wrote:
 ...
 
 2014-01-06 11:43:22 MST DEBUG1 remoteHelperThread_1_1: 112.859 seconds 
 delay for first row
 2014-01-06 11:43:22 MST DEBUG1 remoteHelperThread_1_1: 112.906 seconds 
 until close cursor
 2014-01-06 11:43:22 MST DEBUG1 remoteHelperThread_1_1: inserts=61 
 updates=300 deletes=55 truncates=0

Almost 2 minutes to select 416 log rows? That looks wrong.

Can you change the slon configuration to have an explain_interval? With
that slon will emit EXPLAIN output for the log selection query in that
interval. Maybe something is screwing up the optimizer.


Jan

-- 
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin
___
Slony1-general mailing list
Slony1-general@lists.slony.info
http://lists.slony.info/mailman/listinfo/slony1-general


Re: [Slony1-general] query running slow in slony replication cluster environment

2014-01-04 Thread Jan Wieck
On 01/03/14 23:33, ammu narasimham wrote:
 Hello,
 
 I have configured the replication cluster using slony on Postgres-9.2
 successfully. It is 2-node cluster(Master-slave). When i tried to insert
 100k rows in a table, the execution time is nearly doubled in the new
 environment.

Since each INSERT causes a second INSERT from the Slony log trigger into
the replication log, this is normal. Is this (inserting 100,000 rows at
once) something, your application does on a regular base, or is this
some artificial, unrealistic test, you thought would give you any useful
information?


Jan

-- 
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin
___
Slony1-general mailing list
Slony1-general@lists.slony.info
http://lists.slony.info/mailman/listinfo/slony1-general


Re: [Slony1-general] 6 hours to replicate single table, 12 hours to replicate DB

2014-01-03 Thread Jan Wieck
On 01/02/14 16:12, Tory M Blue wrote:
 We are getting outside of slon and into postgres and i'll try to keep
 this light, here are my postgresql custom config aspects

Postgres and its architecture very much affect the behavior and
performance of Slony. I'm afraid that they cannot be viewed completely
isolate from each other.

 The hardware is being refreshed but right now it's an 8 core, 32GB
 CentOS system, being replaced with some pretty big hardware
 256GB/32cores many SSD's. But tuning this now and knowing where I need
 to tweak when I add the added capacity would be great.

The slon process uses one single database connection to the data
provider for the copy operation and another single database connection
to feed the data into the new replica. A single Postgres session only
utilizes one single core. It therefore makes zero difference if you are
upgrading from a dual core to 64 cores or more.

The suggestions made about work mem and maintenance work mem are valid.
If you are using a dedicated DB user for all your slon database
connections you can tweak these per server via ALTER USER.


Regards,
Jan

-- 
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin
___
Slony1-general mailing list
Slony1-general@lists.slony.info
http://lists.slony.info/mailman/listinfo/slony1-general


Re: [Slony1-general] 6 hours to replicate single table, 12 hours to replicate DB

2014-01-03 Thread Jan Wieck
On 01/03/14 14:42, Tory M Blue wrote:
 
 
 
 On Fri, Jan 3, 2014 at 11:33 AM, Jan Wieck janwi...@yahoo.com
 mailto:janwi...@yahoo.com wrote:
 
 On 01/02/14 16:12, Tory M Blue wrote:
  We are getting outside of slon and into postgres and i'll try to keep
  this light, here are my postgresql custom config aspects
 
 Postgres and its architecture very much affect the behavior and
 performance of Slony. I'm afraid that they cannot be viewed completely
 isolate from each other.
 
  The hardware is being refreshed but right now it's an 8 core, 32GB
  CentOS system, being replaced with some pretty big hardware
  256GB/32cores many SSD's. But tuning this now and knowing where I need
  to tweak when I add the added capacity would be great.
 
 The slon process uses one single database connection to the data
 provider for the copy operation and another single database connection
 to feed the data into the new replica. A single Postgres session only
 utilizes one single core. It therefore makes zero difference if you are
 upgrading from a dual core to 64 cores or more.
 
 The suggestions made about work mem and maintenance work mem are valid.
 If you are using a dedicated DB user for all your slon database
 connections you can tweak these per server via ALTER USER.
 
 
 Regards,
 Jan
 
 
 
 Okay that is interesting.. If you are using a dedicated DB user for all
 your slon database connections, you can tweak these per server via Alter
 User, what will that do for me? We do use the same user across the
 board and i figured I would just modify my postgresql.conf. So i'm a tad
 confused :)

Assuming you have a user slon and all your slon processes use that DB
user to do their work. You connect with psql to a Postgres server as a
superuser (postgres for example) and issue the command

ALTER USER slon SET work_mem TO 2048MB;

From now on, whenever user slon is connecting to a database under that
postmaster, it will have 2GB of work_mem. No matter what the default
setting in your postgresql.conf file is. This will not affect any other
user.

Since these settings (work_mem and maintenance_work_mem) are limits per
database backend, you don't want to set them too high in the global
postgresql.conf file. You'd risk running into swapping. But certain
maintenance jobs as well as slon (when doing subscriptions) do need
rather unusual amounts of resources.


Regards,
Jan

-- 
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin
___
Slony1-general mailing list
Slony1-general@lists.slony.info
http://lists.slony.info/mailman/listinfo/slony1-general


Re: [Slony1-general] Problem slony 2.2 and pg 9.3 double replication

2013-10-16 Thread Jan Wieck
On 10/16/2013 09:55 AM, Sebastien Marchand wrote:
 Sorry I forgot to tell you that I have 2 differents replications .

 SERVER 1 SERVER 2
 SCHEMA A ---SCHEMA A  (CLUSTER NAME =
 repli_general  )
 SCHEMA B ---SCHEMA B   (CLUSTER NAME =
 repli_local110 )

This does not require two clusters with 4 slon daemons. Like Steve 
already wrote you only need two sets.


Jan



 Now one script i use (slony-ctl ) :

 #!/bin/bash
 ## EN: Slonik script for initialising replication
 /usr/pgsql-9.3/bin/slonik _EOF_
  ## EN: Preamble file contains all useful connexions information
   
 CLUSTER NAME = repli_local110 ;
 NODE 101 ADMIN CONNINFO = 'dbname=RPP2 host=192.168.0.101 port=5432
 user=slony password=password';
 NODE 110 ADMIN CONNINFO = 'dbname=RPP2 host=192.168.0.110 port=5432
 user=slony password=password';

  #--
  # Init the first node.  This creates the schema _repli_local110
  # containing all replication system specific database objects.
  #--
  try{
   init cluster ( id=101, comment = 'repli_local110 : 101 - DATABASE -
 192.168.0.101');
  }
  on error {
   echo 'Cluster 101 : ko' ;
   exit 1;
   }
  on success { echo 'Cluster 101 : OK' ; }

  # The other nodes.

 #--
 # Create the node 110 (slave) and tell the 2 nodes how to connect to
 # each other and how they should listen for events.
 #--

 try{
 store node (id = 110,
  EVENT NODE = 101,
  comment = 'DB DATABASE - Host 192.168.0.110 - Cluster
 repli_local110');
  }
  on error {
   echo 'Store node 110 ko';
   exit 1;
   }
  on success {
   echo 'Store node 110 OK';
   }

  store path (server = 101, client = 110, conninfo='dbname=DATABASE
 host=192.168.0.101 port=5432 user=slony password=123');

  store path (server = 110, client = 101, conninfo='dbname=DATABASE
 host=192.168.0.110 port=5432 user=slony password=123');

   
  #--
  # Slony-I organizes tables into sets.  The smallest unit a node can
  # subscribe is a set. The master or origin of the set is node 110.
  #--
  try{
  create set (id=1, origin=110, comment='Tables in DATABASE');
   # Tables

   set add table (set id=1, origin=101, id=67, fully qualified
 name = 'local110.test_tata2', comment='table local110.test_tata2');

   # Sequences
  }
  on error {
   echo 'Creation Set : ko';
   exit 1;
   }
  on success {
   echo 'Creation Set : OK';
   }
  exit 0;
 _EOF_

 if [ $? -ne 0 ]; then
  error CREATE SET 3
  exit 1
 fi

 ## EN: We start daemons. All on the same machine, which is
 # the one on which that script is executed.
 /usr/pgsql-9.3/bin/slon -f /home/slony1-ctl/slony-ctl/etc/slon.cfg
 repli_local110 dbname=DATABASE host=192.168.0.101 port=5432 user=slony
 password=123
/home/slony1-ctl/slony-ctl/logs/repli_local110_DATABASE_101.log 21 
 echo daemon slon pour DATABASE - 192.168.0.101 demarré sur national101
 ssh/usr/pgsql-9.3/bin/slon -f /home/slony1-ctl/slony-ctl/etc/slon.cfg
 repli_local110 dbname=DATABASE host=192.168.0.110 port=5432 user=slony
 password=123 /home/slony/logs/repli_local110_DATABASE_110.log 21 
 echo daemon slon pour DATABASE - 192.168.0.110 demarré sur national101

 ## EN: And now, subscription

 echo Souscription Set 1 par 101
 /usr/pgsql-9.3/bin/slonik _EOF_
  # 
  # preamble file
  # 
  INCLUDE /home/slony1-ctl/slony-ctl/etc/repli_local110.preamble

  # 
  # Node 101 subscribes set 1
  # 
  try{
   subscribe set ( id = 1, provider = 110, receiver = 101, forward =
 yes);
  }
  on error {
   echo 'Souscription ko';
   exit 1;
   }
  on success {
   echo 'Souscription OK';
   }
 _EOF_


 if [ $? -ne 0 ]; then
  error SUBSCRIBE SET 3
  exit 1
 fi
 exit 0




 -Message d'origine-
 De : Steve Singer [mailto:ssin...@ca.afilias.info]
 Envoyé : mercredi 16 octobre 2013 15:29
 À : Sebastien Marchand
 Cc : slony1-general@lists.slony.info
 Objet : Re: [Slony1-general] Problem slony 2.2 and pg 9.3 double replication

 On 10/16/2013 09:17 AM, Sebastien Marchand wrote:
 Hi,

 I want do this :

 SERVER 1 SERVER 2
 SCHEMA A ---SCHEMA A
 SCHEMA B ---SCHEMA B

 That works in slony 2.0.7/pg 9.0

 But with  slony 2.2 and postgresql 9.3

 SERVER 1 SERVER 2
 SCHEMA A ---SCHEMA A= OK
 SCHEMA B ---SCHEMA B= NOK

 I've got this error

 stdin:55: waiting for event (101,54) to be  confirmed on
 node 110

 Pg error :

 2013-10-15 10:55:10.950 CEST ERROR:  relation
 _sloncluster.sl_local_node_id does not exist at character 30
 2013-10-15 10:55:10.950 CEST STATEMENT:  select last_value::int4 from
 _sloncluster.sl_local_node_id

 

Re: [Slony1-general] Problem slony 2.2 and pg 9.3 double replication

2013-10-16 Thread Jan Wieck
On 10/16/13 13:33, Steve Singer wrote:
 On 10/16/2013 12:05 PM, Sebastien Marchand wrote:
 No slon running because script launch slon after create set.

 So i launch before and IT WORKS !!! thx master ! thanks you so much

 I don't know what it change in slony 2.0.7 and 2.2 but before i could do
 this

 But now it's ok :)

 
 
 What SHOULD happen with 2.2 when the slon isn't running is that slonik 
 should keep printing:
 
 waiting for event (1,54) ...
 
 messages, that isn't an error that is an information message saying that 
 it is waiting for the event to be propogated, and that won't happen 
 unless slonik is running.
 
 What I don't understand is how you were getting
 
 2013-10-15 10:55:10.950 CEST ERROR:  relation
 _sloncluster.sl_local_node_id does not exist at character 30
 
 
 type messages.

That is not a slonik message, but is coming from slon. And sloncluster
is the cluster name used in share/slon.conf-sample.


Jan


 
 -Message d'origine-
 De : Steve Singer [mailto:ssin...@ca.afilias.info]
 Envoyé : mercredi 16 octobre 2013 17:56
 À : Sebastien Marchand
 Objet : Re: [Slony1-general] Problem slony 2.2 and pg 9.3 double replication

 On 10/16/2013 11:52 AM, Sebastien Marchand wrote:
 Yes that's why i will delete all replication 2.0 and migrate
 postgresql after i rebuild new replication.

 when i've my  error   waiting for event (1,54) to be
 confirmed on node 110  in postgresql if i do :

 select  * from _repli_local110.sl_path; = 1;110;dbname=RPP2
 host=192.168.0.101 port=5432 user=slony
 password=123;10

 Why just one line ?? i need two lines !

 In my sl_event :

 110;51;2013-10-16
 17:50:54.697374+02;695958:695958:;STORE_PATH;1;110;dbname=RP
 P2
 host=192.168.0.101 port=5432 user=slony password=123;10

 I think there is a problem here... or not

 Slonik when it executes the STORE PATH , it will insert a row in sl_path on
 each node.

 When slon is running it will then replicate the STORE_PATH event from one
 node to the other to create the second sl_path row in each node.

 Is your slon running?



 -Message d'origine-
 De : Steve Singer [mailto:ssin...@ca.afilias.info] Envoyé : mercredi
 16 octobre 2013 17:02 À : Sebastien Marchand Cc :
 slony1-general@lists.slony.info Objet : Re: [Slony1-general] Problem
 slony 2.2 and pg 9.3 double replication

 On 10/16/2013 10:57 AM, Sebastien Marchand wrote:
 I already tried to put in script but no change.

 Can you tell me where is the line of code that is generating this error ?
 it's a sql function ?

 I continue to search i need this to migrate.



 slon_mkquery(query, select max(ev_seqno) FROM \_%s\.sl_event
   , \_%s\.sl_node 
   where ev_origin= 
   (select last_value from
 \_%s\.sl_local_node_id) 
   AND %s AND sl_node.no_id=
   ev_origin
  ,script-clustername, script-clustername,
  script-clustername, event_filter);


 In slonik.c


 slonik_get_last_event_id(SlonikStmt * stmt,
  SlonikScript * script,
  const char *event_filter,
  int64 **events)


 Also, if you are planning on migrating from 2.0 to 2.2 with UPDATE
 FUNCTIONS you should be aware of bug #320



 -Message d'origine-
 De : Steve Singer [mailto:ssin...@ca.afilias.info] Envoyé : mercredi
 16 octobre 2013 16:53 À : Sebastien Marchand Cc :
 slony1-general@lists.slony.info Objet : Re: [Slony1-general] Problem
 slony 2.2 and pg 9.3 double
 replication

 On 10/16/2013 10:48 AM, Steve Singer wrote:
 On 10/16/2013 10:06 AM, Sebastien Marchand wrote:
 CLUSTER NAME = repli_local110 ;
 NODE 101 ADMIN CONNINFO = 'dbname=RPP2 host=192.168.0.101
 port=5432 user=slony password=password'; NODE 110 ADMIN CONNINFO =
 'dbname=RPP2 host=192.168.0.110 port=5432 user=slony
 password=password';

 It's the file repli_local110.preamble but for email i copy this
 file in my script :)


 Actually IGNORE what I said below,
 I hit send way too quickly before reviewing my output.

 I am NOT yet able to reproduce this, but try pasting your preamble
 inline and see if it works.

 I think I see the line of code that is generating the error you are
 seeing but I don't understand how/why it is substituting using
 _sloncluster instead of repli_local10;




 Sometimes details are important.

 Copy the preamble file inline into your script and it will work
 (like you did in your email)

 I can reproduce your issue with the preamble moved to a different file.
  This is a bug, it shouldn't be hard to fix, thanks for reporting it.
  I suspect the fix will be included in 2.2.1 due in the next few
 weeks (since this has been a busy week for 2.2.0 bugs - ie #318 and
 #320).





 
 ___
 

Re: [Slony1-general] Slony replication before db is up

2013-08-31 Thread Jan Wieck
On 08/31/13 05:51, Neha Sharma wrote:
 Hi,
  
 We are using master and slave postgres and slony for replcating the
 database.
 The master and slave are on different servers in the same LAN. The one
 containg master db was rebooted. When the machine came up, the slony
 started replication but the db state is still unavailable. The postgres
 logs shows the following errors:
 
 UTCFATAL:  no pg_hba.conf entry for host 172.168.113.4, user
 localclient, database msw, SSL off
  
 where 172.168.113.4 is the IP address of the server which was rebooted.
  
 Could it be possible that before the database is up, slony starts
 replication. If not then why above error is being shown in postgres logs.
 Please note that slon logs are showing messages corresponding to copying
 of tables at the same time.

That error message only indicates that some client, which may or may
not be a slon process, is denied to connect because of the host based
authentication failing. A Slony cluster always consists of multiple slon
processes, each establishing multiple database connections. One of them
copying or replicating data is no guarantee that every other one can
also establish all the connections, they need.

Is Slony actually using the user localclient for its connections?


Jan


-- 
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin
___
Slony1-general mailing list
Slony1-general@lists.slony.info
http://lists.slony.info/mailman/listinfo/slony1-general


Re: [Slony1-general] init cluster when the other node is not reachable ?

2013-07-30 Thread Jan Wieck
On 07/30/13 10:18, Nicolas Frossard wrote:
 Hello dear Slony-1 users,
 
 I have a simple one master one slave scenario but an annoying
 requirement: the master and the slave should be able to be configured at
 different time, without the other node being necessarily reachable...
 
 For example, the following should be possible:
 
 1- start the master DB (slave DB is not started yet)
 2- init cluster/create set (only on the master)
 3- start slon process on the master
 4- add subscription of slave to the set (only on the master)
 
 ... some time later ...
 
 5- start the slave DB
 6- init cluster/create set (only on the slave)
 7- start slon process on the slave
 9- add subscription of slave to the set (only on the slave)
 
 ... now the replication is working !
 
 unfortunately, it seems that it is impossible to correctly
 initialize/configure one node (the master) and then later the other (the
 slave). I have to do it at the same time, and both nodes need to be
 reachable.
 
 Is there way to solve this ? I'm using PostgreSQL 9.2 and Slony-1 2.1.3.

The one thing that is impossible in your above description is the STORE
NODE command for the second node (slave). For that one, both nodes must
be reachable because slonik needs to connect to both databases at the
same time.

Then again, if the nodes can never reach each other, how are they going
to replicate at all? Something in your problem doesn't make sense.


Jan

-- 
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin
___
Slony1-general mailing list
Slony1-general@lists.slony.info
http://lists.slony.info/mailman/listinfo/slony1-general


Re: [Slony1-general] Slony Watchdog failed starting up the child process

2013-07-23 Thread Jan Wieck
On 07/23/13 15:21, Rose Nancy wrote:
 Hi Chris,
 
 On 13-07-23 03:07 PM, Christopher Browne wrote:
 My intuition from seeing it say FATAL is that that's indicating 
 death of process, and that there's not much coming back from it.

 This behaviour is pretty consistent with what happens with a Postgres 
 postmaster; if the attempt to start up fails due to seeming already to 
 have a postmaster, it doesn't retry, pg_ctl immediately gives up.
 
 In my case the slons are running in a separate server which I rebooted.

 By the way, is this possibly because of a zombied old connection that 
 got disconnected due to firewall glitch or such?  If so, you should 
 probably see about lowering the TCP keepalive parameters both in the 
 slon.conf file and in postgresql.conf
 You're right, the duplicated key error was caused by a zombie old 
 connection that got disconnected due to the slony server reboot.

Which is something that TCP keepalives should clean out after a while.

My recollection of the discussion, that led to the current behavior, is
that we had several options.

A) have it retry no matter what. In the case of a zombie that doesn't
time out, that doesn't help, but at least it creates a constant stream
of error messages.

B) Treat the first connection attempt different, so that the system
startup script or a later manual invocation will signal a problem.

C) Kick the existing (assumed to be dead) slon out (by killing its
backend). But if you really have two slons, that are alive and possibly
running on different systems, they will keep stabbing each other in the
back and eventually never get anything accomplished.


We chose B) at that time.


I am open for new ideas.


Jan

-- 
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin
___
Slony1-general mailing list
Slony1-general@lists.slony.info
http://lists.slony.info/mailman/listinfo/slony1-general


Re: [Slony1-general] DML count in sl_apply_stats

2013-07-09 Thread Jan Wieck
Slony operates row based. It has no idea how many client side statements have 
created those rows. Or if any row was inserted, updated or deleted as a direct 
result of a client query, a stored procedure, a trigger or a cascaded foreign 
key operation.


Jan

--
Anyone who trades liberty for security deserves neither 
liberty nor security. -- Benjamin Franklin

Raghav ragavendra@gmail.com wrote:

Hi,

insert into stest values(generate_series(1,100));
or
insert into stest values (101),(102),(103);

How does above two SQL statements treated and counted in
sl_apply_stats.as_num_insert column ?

I feel it should be considered as one INSERT statement, no ?,
But it has multiplied the count even with generate_series.

Test case:

postgres=# select as_num_insert from _rep220.sl_apply_stats ; //before
insert
 as_num_insert
---
   0
(1 row)

postgres=# insert into stest values (generate_series(1,100));
INSERT 0 100

postgres=# select as_num_insert from _rep220.sl_apply_stats ;
 as_num_insert
---
   100
(1 row)

postgres=# insert into stest values (101),(102),(103);
INSERT 0 3

postgres=# select as_num_insert from _rep220.sl_apply_stats ;
 as_num_insert
---
   103
(1 row)


-- 
Regards
Raghav
Blog: htt://raghavt.blogspot.com/

___
Slony1-general mailing list
Slony1-general@lists.slony.info
http://lists.slony.info/mailman/listinfo/slony1-general
___
Slony1-general mailing list
Slony1-general@lists.slony.info
http://lists.slony.info/mailman/listinfo/slony1-general


Re: [Slony1-general] Slony replication has stopped replicating

2013-06-23 Thread Jan Wieck
On 06/22/13 19:48, Scott Marlowe wrote:
 On Sat, Jun 22, 2013 at 4:45 PM, Jan Wieck janwi...@yahoo.com wrote:
 There is nothing laughable about your situation, so I don't know what you
 LOL all the time.
 
 It might be one of those laugh or cry things.
 
 Whoever installed that database and Slony system determined where the logs
 go. Check the system startup scripts ans syslog files/configuration for any
 hints. Unfortunately every Linux packager has his/her own mind where things
 are supposed to go. Not like in half organized FreeBSD territory, where you
 at least can guess where the logs may be.
 
 To be fair, most distros put it all in a few common places. I.e.
 Redhat / rpm based or Debian / deb based.  And there's nothing
 stopping a BSD sysadmin from building postgresql ala ./configure
 --prefix=/home/mydir/are/you/kidding/me
 

Fair enough. But then again, everyone can do that compiling from sources
(unless you try to imply that Linux sysadmins don't know any more how to
do that).

The points several people have made remain. Bambi has to follow the
system startup breadcrumbs to figure out where the log is, if he/she
intends to find out what is currently broken in order to fix the current
system

On the other hand, fixing that system properly means to upgrade
everything to the latest versions. Which should start with something
like apt-get update or yum update instead of attempting to just get
past the current problem. I still have the feeling that Slony stopped
working because of a far greater underlying problem.


Jan

-- 
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin
___
Slony1-general mailing list
Slony1-general@lists.slony.info
http://lists.slony.info/mailman/listinfo/slony1-general


Re: [Slony1-general] Slony replication has stopped replicating

2013-06-22 Thread Jan Wieck
On 06/22/13 14:48, Bambi Bellows wrote:
 Hi Gang!
 
 Slony replication has stopped replicating, and I'm not sure why, or how
 to get it replicating again.  I have bounced Slony on the master and all
 the remote nodes, but no luck.  Is there a Slony expert out there who
 can help point me in the right direction?
 
 Version info:
 
 slonik version 2.1.0.b2
 
 psql (9.0.4)

Slony 2.1 is on 2.1.3 and PostgreSQL 9.0 is on 9.0.13. Is there a
particular reason why you keep running software that is known to have
plenty of bugs?


Jan


-- 
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin
___
Slony1-general mailing list
Slony1-general@lists.slony.info
http://lists.slony.info/mailman/listinfo/slony1-general


Re: [Slony1-general] Slony replication has stopped replicating

2013-06-22 Thread Jan Wieck
On 06/22/13 16:56, Bambi Bellows wrote:
 LOL.  Thanks Jan.  No, I inherited these systems, and upgrading is not really 
 an option for the database...  I am, however, trying to find where the log 
 files are.  They aren't in the /opt/logs/postgres directory, and they aren't 
 in the postgres logs directory.  I have two sudo finds running, but, so far, 
 no luck.  Gotta say, haven't been having the best of luck so far on anything 
 slony-related.

Slony related?

I would say the whole thing you inherited is a complete mess and Slony
is a casualty of that. There is nothing so far that I can see could be
blamed on Slony. Only a grossly outdated and unmanaged server setup.

Sorry to be so blunt, but I just don't tolerate it when Slony or
PostgreSQL are blamed for server admin negligence and incompetence.

This is apparently not your fault, but your predecessor's fault. So
don't take it personal.


Jan

 
 -Original Message-
 From: Jan Wieck [mailto:janwi...@yahoo.com] 
 Sent: Saturday, June 22, 2013 3:53 PM
 To: Bambi Bellows
 Cc: slony1-general@lists.slony.info
 Subject: Re: [Slony1-general] Slony replication has stopped replicating
 
 On 06/22/13 14:48, Bambi Bellows wrote:
 Hi Gang!
 
 Slony replication has stopped replicating, and I'm not sure why, or how
 to get it replicating again.  I have bounced Slony on the master and all
 the remote nodes, but no luck.  Is there a Slony expert out there who
 can help point me in the right direction?
 
 Version info:
 
 slonik version 2.1.0.b2
 
 psql (9.0.4)
 
 Slony 2.1 is on 2.1.3 and PostgreSQL 9.0 is on 9.0.13. Is there a
 particular reason why you keep running software that is known to have
 plenty of bugs?
 
 
 Jan
 
 


-- 
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin
___
Slony1-general mailing list
Slony1-general@lists.slony.info
http://lists.slony.info/mailman/listinfo/slony1-general


Re: [Slony1-general] Slony replication has stopped replicating

2013-06-22 Thread Jan Wieck
There is nothing laughable about your situation, so I don't know what you LOL 
all the time.

Whoever installed that database and Slony system determined where the logs go. 
Check the system startup scripts ans syslog files/configuration for any hints. 
Unfortunately every Linux packager has his/her own mind where things are 
supposed to go. Not like in half organized FreeBSD territory, where you at 
least can guess where the logs may be.


Jan

--
Anyone who trades liberty for security deserves neither 
liberty nor security. -- Benjamin Franklin

Bambi Bellows bbell...@dotomi.com wrote:

LOL.  Thank you for your advice.  I will talk to my boss about this on Monday.

-Original Message-
From: Scott Marlowe [mailto:scott.marl...@gmail.com] 
Sent: Saturday, June 22, 2013 5:19 PM
To: Bambi Bellows
Cc: Jan Wieck; slony1-general@lists.slony.info
Subject: Re: [Slony1-general] Slony replication has stopped replicating

On Sat, Jun 22, 2013 at 3:16 PM, Bambi Bellows bbell...@dotomi.com wrote:
 LOL.  You guys are killin me.  Meantime, can someone point me to where the 
 log file destination might be hidden?  Because I can't even find out what 
 broke and why.  And as much as it's great to blame my predecessors, they 
 aren't here anymore, and I'm out of my depth on Slony... ob.vi.ous.ly.  And 
 telling my boss to suck it is not really an option for me.


I'm not telling you to tell them to suck it, I'm telling you to tell
them you need to fix it right. Take the system offline for as long as
it takes to upgrade postgresql (a few minutes typically). Drop the
current slony, install the latest 2.1.x, and restart replication from
scratch. Running around with your hair on fire at this point isn't
likely to result in a long term sustainable fix, but a slowly
repeating cycle of endless horror as the system gets fixed, breaks,
and gets fixed over and over.




This email and any files included with it may contain privileged,
proprietary and/or confidential information that is for the sole use
of the intended recipient(s).  Any disclosure, copying, distribution,
posting, or use of the information contained in or attached to this
email is prohibited unless permitted by the sender.  If you have
received this email in error, please immediately notify the sender
via return email, telephone, or fax and destroy this original transmission
and its included files without reading or saving it in any manner.
Thank you.
___
Slony1-general mailing list
Slony1-general@lists.slony.info
http://lists.slony.info/mailman/listinfo/slony1-general


Re: [Slony1-general] Slony cleanupEvent erroring out with server closed the connection unexpectedly - Soln version: 2.1.2

2013-06-15 Thread Jan Wieck
On 06/15/13 04:46, Sridevi R wrote:
 Jan,
 
 You are right. Tweaking the tcp keep alive parameters helped.
 
 Now slon.conf contains:
 
 tcp_keepalive=true
 tcp_keepalive_interval=45
 tcp_keepalive_count=20
 tcp_keepalive_idle=30
 cleanup_interval=30
 
 Thanks a lot for the timely response.

You're welcome.


Jan


 
 - Sridevi
 
 
 
 On Thu, Jun 13, 2013 at 11:46 PM, Jan Wieck janwi...@yahoo.com
 mailto:janwi...@yahoo.com wrote:
 
 On 06/13/13 06:25, Sridevi R wrote:
  Hello Jan,
 
  The Master and Slave DBs talk through a firewall.
  VIP IPs and SNAT IPs are used in pg_hba.conf.
 
  The corresponding messages in the postgres server log:
 
  2013-06-13 09:46:21.224 GMT,,,6630,10.4.2.2:42031
 http://10.4.2.2:42031
  http://10.4.2.2:42031,51b994ed.19e6,1,,2013-06-13 09:46:21
  GMT,,0,LOG,08P01,incomplete startup packet,
  2013-06-13 09:57:38.596 GMT,postgres,db01,6634,ip address
 printed
  here:53924,51b994f7.19ea,1,idle,2013-06-13 09:46:31
  GMT,28/0,0,LOG,08006,could not receive data from client: Connection
  reset by peer,slon.node_1_listen
  2013-06-13 09:57:38.596 GMT,postgres,db01,6634,ip address
 printed
  here:53924,51b994f7.19ea,2,idle,2013-06-13 09:46:31
  GMT,28/0,0,LOG,08P01,unexpected EOF on client
  connection,slon.node_1_listen
  2013-06-13 09:57:38.607 GMT,postgres,db01,6637,ip address
 printed
  here:53926,51b994f9.19ed,1,idle,2013-06-13 09:46:33
  GMT,32/0,0,LOG,08006,could not receive data from client: Connection
  reset by peer,slon.subscriber_1_provider_1
  2013-06-13 09:57:38.607 GMT,postgres,db01,6637,ip address
 printed
  here:53926,51b994f9.19ed,2,idle,2013-06-13 09:46:33
  GMT,32/0,0,LOG,08P01,unexpected EOF on client
  connection,slon.subscriber_1_provider_1
  2013-06-13 09:57:38.608 GMT,postgres,db01,6635,ip address
 printed
  here:53925,51b994f7.19eb,1,idle,2013-06-13 09:46:31
  GMT,31/0,0,LOG,08006,could not receive data from client: Connection
  reset by peer,slon.node_1_listen
  2013-06-13 09:57:38.608 GMT,postgres,db01,6635,ip address
 printed
  here:53925,51b994f7.19eb,2,idle,2013-06-13 09:46:31
  GMT,31/0,0,LOG,08P01,unexpected EOF on client
  connection,slon.node_1_listen
 
  The client slon log contains:
  2013-06-13 09:57:38 GMT FATAL  cleanupThread: begin;lock table
  _xx_cluster.sl_config_lock;select _xx_cluster.cleanupEvent('10
  minutes'::interval);commit; - server closed the connection
 unexpectedly
  This probably means the server terminated abnormally
  before or while processing the request.
 
 This all can very well be a slightly too eager firewall dropping idle
 connections. Have you tried to enable TCP keep alive options that kick
 in after something like 30 seconds? If not, enable them on both, the PG
 server and the Slony side. That usually prevents those firewall issues.
 
 
 Jan
 
 
 
 
  Thanks,
  Sridevi
 
 
 
 
 
  On Thu, Jun 13, 2013 at 12:02 AM, Jan Wieck janwi...@yahoo.com
 mailto:janwi...@yahoo.com
  mailto:janwi...@yahoo.com mailto:janwi...@yahoo.com wrote:
 
  On 06/12/13 10:17, Sridevi R wrote:
   Jan,
  
   Thanks for the reply.
  
   The only errors in the slon log are failure of cleanupThread.
   child process is restarting right after the cleanupThread
 Failure.
   This occurs approximately every 10 minutes since
 cleanup_interval
  is set
   to 10 minutes.
  
   Here is a sample from the log again:
  
   2013-06-06 14:23:27 GMT FATAL  cleanupThread: begin;lock table
   _xx_cluster.sl_config_lock;select
 _xx_cluster.cleanupEvent('10
   minutes'::interval);commit; - server closed the connection
  unexpectedly
   This probably means the server terminated abnormally
   before or while processing the request.
   2013-06-06 14:23:27 GMT CONFIG slon: child terminated
 signal: 9; pid:
   16135, current worker pid: 16135
   2013-06-06 14:23:27 GMT CONFIG slon: restart of worker in 10
 seconds
 
  server closed the connection unexpectedly ...
 
  Is this connection by any chance through some firewall or NAT
 gateway
  that will drop idle connections?
 
  What are the corresponding postmaster server log entries?
 Since slony
  reports an unexpected connection drop from the server, the
 server must
  have some message in its log too, because the client never
 sent the 'X'
  libpq protocol message.
 
 
  Jan
 
 
  
   Thanks ,
   Sridevi
  
  
   On Wed, Jun 12

Re: [Slony1-general] Slony cleanupEvent erroring out with server closed the connection unexpectedly - Soln version: 2.1.2

2013-06-13 Thread Jan Wieck
On 06/13/13 06:25, Sridevi R wrote:
 Hello Jan,
 
 The Master and Slave DBs talk through a firewall.
 VIP IPs and SNAT IPs are used in pg_hba.conf.
 
 The corresponding messages in the postgres server log:
 
 2013-06-13 09:46:21.224 GMT,,,6630,10.4.2.2:42031
 http://10.4.2.2:42031,51b994ed.19e6,1,,2013-06-13 09:46:21
 GMT,,0,LOG,08P01,incomplete startup packet,
 2013-06-13 09:57:38.596 GMT,postgres,db01,6634,ip address printed
 here:53924,51b994f7.19ea,1,idle,2013-06-13 09:46:31
 GMT,28/0,0,LOG,08006,could not receive data from client: Connection
 reset by peer,slon.node_1_listen
 2013-06-13 09:57:38.596 GMT,postgres,db01,6634,ip address printed
 here:53924,51b994f7.19ea,2,idle,2013-06-13 09:46:31
 GMT,28/0,0,LOG,08P01,unexpected EOF on client
 connection,slon.node_1_listen
 2013-06-13 09:57:38.607 GMT,postgres,db01,6637,ip address printed
 here:53926,51b994f9.19ed,1,idle,2013-06-13 09:46:33
 GMT,32/0,0,LOG,08006,could not receive data from client: Connection
 reset by peer,slon.subscriber_1_provider_1
 2013-06-13 09:57:38.607 GMT,postgres,db01,6637,ip address printed
 here:53926,51b994f9.19ed,2,idle,2013-06-13 09:46:33
 GMT,32/0,0,LOG,08P01,unexpected EOF on client
 connection,slon.subscriber_1_provider_1
 2013-06-13 09:57:38.608 GMT,postgres,db01,6635,ip address printed
 here:53925,51b994f7.19eb,1,idle,2013-06-13 09:46:31
 GMT,31/0,0,LOG,08006,could not receive data from client: Connection
 reset by peer,slon.node_1_listen
 2013-06-13 09:57:38.608 GMT,postgres,db01,6635,ip address printed
 here:53925,51b994f7.19eb,2,idle,2013-06-13 09:46:31
 GMT,31/0,0,LOG,08P01,unexpected EOF on client
 connection,slon.node_1_listen
 
 The client slon log contains:
 2013-06-13 09:57:38 GMT FATAL  cleanupThread: begin;lock table
 _xx_cluster.sl_config_lock;select _xx_cluster.cleanupEvent('10
 minutes'::interval);commit; - server closed the connection unexpectedly
 This probably means the server terminated abnormally
 before or while processing the request.

This all can very well be a slightly too eager firewall dropping idle
connections. Have you tried to enable TCP keep alive options that kick
in after something like 30 seconds? If not, enable them on both, the PG
server and the Slony side. That usually prevents those firewall issues.


Jan


 
 
 Thanks,
 Sridevi
 
 
 
 
 
 On Thu, Jun 13, 2013 at 12:02 AM, Jan Wieck janwi...@yahoo.com
 mailto:janwi...@yahoo.com wrote:
 
 On 06/12/13 10:17, Sridevi R wrote:
  Jan,
 
  Thanks for the reply.
 
  The only errors in the slon log are failure of cleanupThread.
  child process is restarting right after the cleanupThread Failure.
  This occurs approximately every 10 minutes since cleanup_interval
 is set
  to 10 minutes.
 
  Here is a sample from the log again:
 
  2013-06-06 14:23:27 GMT FATAL  cleanupThread: begin;lock table
  _xx_cluster.sl_config_lock;select _xx_cluster.cleanupEvent('10
  minutes'::interval);commit; - server closed the connection
 unexpectedly
  This probably means the server terminated abnormally
  before or while processing the request.
  2013-06-06 14:23:27 GMT CONFIG slon: child terminated signal: 9; pid:
  16135, current worker pid: 16135
  2013-06-06 14:23:27 GMT CONFIG slon: restart of worker in 10 seconds
 
 server closed the connection unexpectedly ...
 
 Is this connection by any chance through some firewall or NAT gateway
 that will drop idle connections?
 
 What are the corresponding postmaster server log entries? Since slony
 reports an unexpected connection drop from the server, the server must
 have some message in its log too, because the client never sent the 'X'
 libpq protocol message.
 
 
 Jan
 
 
 
  Thanks ,
  Sridevi
 
 
  On Wed, Jun 12, 2013 at 7:33 PM, Jan Wieck janwi...@yahoo.com
 mailto:janwi...@yahoo.com
  mailto:janwi...@yahoo.com mailto:janwi...@yahoo.com wrote:
 
  On 06/12/13 07:14, Sridevi R wrote:
   Hello,
  
   The slony logs are consistently posting this error:
  
   2013-06-12 10:01:05 GMT FATAL  cleanupThread: begin;lock table
   _xx_cluster.sl_config_lock;select
 _xx_cluster.cleanupEvent('10
   minutes'::interval);commit; - server closed the connection
  unexpectedly
   2013-06-12 10:12:24 GMT FATAL  cleanupThread: begin;lock table
   _xx_cluster.sl_config_lock;select
 _xx_cluster.cleanupEvent('10
   minutes'::interval);commit; - server closed the connection
  unexpectedly
  
   checked and found that sl_confirm table is not cleaned up.
 cleanup
  event
   never succeeds.
   Additionally, the child processes terminates and restarts
 after each
   such cleanup failure.
  
   2013-06-11 11:20:04 GMT

Re: [Slony1-general] Slony cleanupEvent erroring out with server closed the connection unexpectedly - Soln version: 2.1.2

2013-06-12 Thread Jan Wieck
On 06/12/13 10:17, Sridevi R wrote:
 Jan,
 
 Thanks for the reply.
 
 The only errors in the slon log are failure of cleanupThread.
 child process is restarting right after the cleanupThread Failure.
 This occurs approximately every 10 minutes since cleanup_interval is set
 to 10 minutes.
 
 Here is a sample from the log again:
 
 2013-06-06 14:23:27 GMT FATAL  cleanupThread: begin;lock table
 _xx_cluster.sl_config_lock;select _xx_cluster.cleanupEvent('10
 minutes'::interval);commit; - server closed the connection unexpectedly
 This probably means the server terminated abnormally
 before or while processing the request.
 2013-06-06 14:23:27 GMT CONFIG slon: child terminated signal: 9; pid:
 16135, current worker pid: 16135
 2013-06-06 14:23:27 GMT CONFIG slon: restart of worker in 10 seconds

server closed the connection unexpectedly ...

Is this connection by any chance through some firewall or NAT gateway
that will drop idle connections?

What are the corresponding postmaster server log entries? Since slony
reports an unexpected connection drop from the server, the server must
have some message in its log too, because the client never sent the 'X'
libpq protocol message.


Jan


 
 Thanks ,
 Sridevi
 
 
 On Wed, Jun 12, 2013 at 7:33 PM, Jan Wieck janwi...@yahoo.com
 mailto:janwi...@yahoo.com wrote:
 
 On 06/12/13 07:14, Sridevi R wrote:
  Hello,
 
  The slony logs are consistently posting this error:
 
  2013-06-12 10:01:05 GMT FATAL  cleanupThread: begin;lock table
  _xx_cluster.sl_config_lock;select _xx_cluster.cleanupEvent('10
  minutes'::interval);commit; - server closed the connection
 unexpectedly
  2013-06-12 10:12:24 GMT FATAL  cleanupThread: begin;lock table
  _xx_cluster.sl_config_lock;select _xx_cluster.cleanupEvent('10
  minutes'::interval);commit; - server closed the connection
 unexpectedly
 
  checked and found that sl_confirm table is not cleaned up. cleanup
 event
  never succeeds.
  Additionally, the child processes terminates and restarts after each
  such cleanup failure.
 
  2013-06-11 11:20:04 GMT CONFIG slon: child terminated signal: 9; pid:
  20172, current worker pid: 20172
  2013-06-11 11:20:04 GMT CONFIG slon: restart of worker in 10 seconds
 
  When cleanup is run manually, on the psql prompt it runs to completion
  without any issues and cleans up sl_event and sl_confirm tables
  begin;lock table _xx_cluster.sl_config_lock;select
  _xx_cluster.cleanupEvent('10 minutes'::interval);commit;
 
  Soln version: 2.1.2
 
  Any help/insight would be greatly appreciated.
 
 Slon kills its worker(s) with signal 9 (SIGKILL) when it needs to
 restart, like when there are errors in event processing or if it
 receives certain signals. Are there any other errors in the slon log or
 is something on the machine sending signals to slon?
 
 
 Jan
 
 
  Thanks,
  Sridevi
 
 
 
  ___
  Slony1-general mailing list
  Slony1-general@lists.slony.info
 mailto:Slony1-general@lists.slony.info
  http://lists.slony.info/mailman/listinfo/slony1-general
 
 
 
 --
 Anyone who trades liberty for security deserves neither
 liberty nor security. -- Benjamin Franklin
 
 


-- 
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin
___
Slony1-general mailing list
Slony1-general@lists.slony.info
http://lists.slony.info/mailman/listinfo/slony1-general


Re: [Slony1-general] Slony Versions and Postgres Versions and the compatibility.

2013-06-07 Thread Jan Wieck
On 06/06/13 04:16, Raghav wrote:
 
 __
 
 Is there a Hard Rule that the Slony versions must be 100% exact at
 all times, or can a Target Node be at a newer subversion ahead of
 the Master and still run.
 
  
 AFAIK, Slony-I need exact version on both the node's.
 Because upgrading a node to newer version will do changes to
 slony_func.so file where it should match the version with the other
 replicating nodes else it will break. Below link has a comment after the
 upgrade steps as  If there is any mismatch between component versions,
 the slon http://main.slony.info/documentation/2.1/slon.html will
 refuse to start up, which provides protection against corruption.

That is correct. Slony requires the same Slony version on all nodes, but
supports running on top of different PostgreSQL version.


Jan


 
 http://main.slony.info/documentation/2.1/slonyupgrade.html
 
 For your other questions, hopefully you will see some answer's here.
 
 -- 
 Regards
 Raghav
 Blog: htt://raghavt.blogspot.com/ http://raghavt.blogspot.com/
 
 
 ___
 Slony1-general mailing list
 Slony1-general@lists.slony.info
 http://lists.slony.info/mailman/listinfo/slony1-general
 


-- 
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin
___
Slony1-general mailing list
Slony1-general@lists.slony.info
http://lists.slony.info/mailman/listinfo/slony1-general


Re: [Slony1-general] Truncate trigger in slony 2.1.x+ requires superuser?

2013-05-16 Thread Jan Wieck

On 05/15/2013 02:50 PM, Jan Wieck wrote:

I would call it a bug. The truncate trigger should be security
definer, like the log trigger and several other functions are.


I have committed the 2-line fix for this to master and REL_2_1_STABLE.

The diff for 2.1 is attached.


Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin
diff --git a/src/backend/slony1_funcs.sql b/src/backend/slony1_funcs.sql
index 7c083db..692d3ff 100644
--- a/src/backend/slony1_funcs.sql
+++ b/src/backend/slony1_funcs.sql
@@ -5540,7 +5540,8 @@ $$
 		end if;
 		return NULL;
 end
-$$ language plpgsql;
+$$ language plpgsql
+security definer;
 
 comment on function @NAMESPACE@.log_truncate ()
 is 'trigger function run when a replicated table receives a TRUNCATE request';
___
Slony1-general mailing list
Slony1-general@lists.slony.info
http://lists.slony.info/mailman/listinfo/slony1-general


Re: [Slony1-general] Truncate trigger in slony 2.1.x+ requires superuser?

2013-05-15 Thread Jan Wieck
On 05/15/2013 01:51 PM, Brian Fehrle wrote:
 Hi All,

 I have a slony cluster running on slony 2.1.0 (Have tested this
 situation on 2.1.3 also). This cluster is replicating a drupal database,
 which preforms truncates on some of the cache tables periodically. When
 the truncates execute, they fire the log_truncate trigger in slony,
 which immediatly errors due to no select permissions on the sl_table
 relation in the slony cluster.

 In normal events (inserts, updates, deletes), the user doesn't need any
 permissions granted in the slony schema in order for the event trigger
 to log the event into sl_log_1, etc.

 So is this a design decision, making truncates on slony tables only be
 able to be executed by superusers in the database?

I would call it a bug. The truncate trigger should be security 
definer, like the log trigger and several other functions are.


Jan


 I've got it working in a testing environment by granting the following:
 grant select on _slony.sl_table to user;
 grant insert on _slony.sl_log_status to user;
 grant insert on _slony.sl_log_1 to user;
 grant insert on _slony.sl_log_2 to user;
 grant usage on _slony.sl_action_seq to user;

 However I don't like the idea of either A. granting these permissions to
 slony tables or B. making my drupal user a 'superuser' to get around it.
 Are those my only two options here to allow truncates to work?

 Thanks,
 - Brian F
 ___
 Slony1-general mailing list
 Slony1-general@lists.slony.info
 http://lists.slony.info/mailman/listinfo/slony1-general



-- 
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin
___
Slony1-general mailing list
Slony1-general@lists.slony.info
http://lists.slony.info/mailman/listinfo/slony1-general


Re: [Slony1-general] pgsql replication with database schema changes using slonik execute script

2013-05-09 Thread Jan Wieck
On 05/09/13 20:31, Vick Khera wrote:
 I'm pretty sure slonik will just block until it is reachable. Try not to
 do that. This may be different in slony 2.0+, but I haven't tested.

Unless there is an implicit or explicit WAIT FOR EVENT done my slonik,
nothing should block. EXECUTE SCRIPT in all versions prior to 2.2
creates an event, containing the SQL. In 2.2 the SQL is injected into
the data stream processed during SYNC application to make it work
reliable with concurrent table access.


Jan

 
 
 
 On Tue, May 7, 2013 at 6:54 AM, kapil bhadke kapilbha...@gmail.com
 mailto:kapilbha...@gmail.com wrote:
 
 I am using slony for replicating my pgsql database. I want to
 propagate the database schema changes(DDL) from master to the slave
 machine with the help of slonik execute script. I am curious to know
 what happens when I run slonik execute script(on master) and slave
 machine is not reachable. How will these schema changes get
 reflected on slave machine?
 
 ___
 Slony1-general mailing list
 Slony1-general@lists.slony.info mailto:Slony1-general@lists.slony.info
 http://lists.slony.info/mailman/listinfo/slony1-general
 
 
 
 
 ___
 Slony1-general mailing list
 Slony1-general@lists.slony.info
 http://lists.slony.info/mailman/listinfo/slony1-general
 


-- 
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin
___
Slony1-general mailing list
Slony1-general@lists.slony.info
http://lists.slony.info/mailman/listinfo/slony1-general


Re: [Slony1-general] Slony multiple connections with database

2013-05-09 Thread Jan Wieck
On 05/01/13 02:41, Neha Sharma wrote:
 Hi,
 
 We are using slony to synchronise two postgres databases of two machines. 
 
 Generally slony creates 6 connections with the database - 4 with local
 db and 2 with remote db.
 
 But some problem is being observed on the machine having master db.
 Various slony remote connections are seen with the database.
 The limit of postgres connections is 100 and slony is consuming maximum
 connections due to which the db fails to respond after sometime. It was
 observed that connection reached to 104 out of which slony connections
 (which remote has created with db) were 74.
 We have to restart slony to clean all the connections.
 
 Ideally there should only be 2 connections from the remote - one created
 by listener and another by worker thread.
 
 There is no connectivity issue between the two machines.

This sounds very much like lost TCP connections that never get the FIN
or RST packts. Have you tried to enable keepalive with settings, that
let connections time out within 5 minutes or faster in both, the
PostgreSQL and the Slony configurations?


Jan

 
 Could anyone please suggest what could be the reason for this weired
 behaviour of slony?
 Also, please tell if slony creates new connections without removing the
 older ones?
 
 -- 
 */Neha Sharma/*
 
 
 ___
 Slony1-general mailing list
 Slony1-general@lists.slony.info
 http://lists.slony.info/mailman/listinfo/slony1-general
 


-- 
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin
___
Slony1-general mailing list
Slony1-general@lists.slony.info
http://lists.slony.info/mailman/listinfo/slony1-general


Re: [Slony1-general] session_replication_role 'replica' behavior and referential integrity constraints

2013-04-26 Thread Jan Wieck
On 4/24/2013 4:01 AM, Manos Tsahakis wrote:
 Hello all,

 In our application we are enabling session_replication_role TO 'replica'
 in certain situations so that triggers will not fire in a table during
 DML operations. However, we observed that when setting
 session_replication_role TO 'replica' referential integrity constraints
 will not fire on a table either.

 A simple example is given bellow:

 dynacom=# create table parent (id serial primary key, name text not null);

 dynacom=# create table child (id serial primary key, name text not
 null,pid int NOT NULL REFERENCES parent(id) ON DELETE CASCADE);

 dynacom=# insert into parent (name) values ('test 1');
 INSERT 0 1

 dynacom=# insert into parent (name) values ('test 2');
 INSERT 0 1

 dynacom=# insert into child (name,pid) values ('test kid2',2);
 INSERT 0 1
 dynacom=# begin ;
 BEGIN
 dynacom=# set session_replication_role TO 'replica';
 SET
 dynacom=# delete from parent where id=2;
 DELETE 1
 dynacom=# commit ;
 COMMIT

 dynacom=# select * from child;
   id |   name| pid
 +---+-
2 | test kid2 |   2
 (1 row)

 dynacom=# select * from parent;
   id | name
 +--
 (0 rows)

 So we are a left, basically, with an inconsistent database.

 1. 9.2 documentation
 (http://www.postgresql.org/docs/9.2/static/sql-altertable.html) in
 theDISABLE/ENABLE [ REPLICA | ALWAYS ] TRIGGER section, makes a
 distinction between USER (non system-constraint related) and ALL
 triggers, but does not state that simply(??) enabled system (non-user)
 constraint triggers will not fire in case of session_replication_role =
 replica. Shouldn't non-user triggers *not* be affected by
 session_replication_role ?

 2. Is there any way to just find the name of the FK constraint trigger
 and convert it to
 ENABLE ALWAYS?

 For the above test we used postgresql 9.2, currently we are running
 postgresql 9.0 in production.

You want to set the session_replication_role to LOCAL.

Eventually the docs should be a little clearer and eventually something 
is missing in the docs entirely.

The missing piece is probably that the Slony-I triggers explicitly check 
the session_replication_role. The log trigger does nothing if the 
setting isn't ORIGIN. The deny-access trigger only errors out if the 
setting is ORIGIN.

The poorly documented side effect of that is that if you set the role to 
LOCAL, all Slony triggers are effectively disabled while all user 
defined and integrity triggers will work like on a stand-alone database.


Jan

-- 
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin
___
Slony1-general mailing list
Slony1-general@lists.slony.info
http://lists.slony.info/mailman/listinfo/slony1-general


Re: [Slony1-general] Help - replication has broken, best way to restart it without destroying data.

2013-03-06 Thread Jan Wieck
On 3/6/2013 7:08 AM, Jonathan Soong wrote:
 Hi guys

 Replication has died on my master-slave.

 What i would really like to do is stop replication and then get it to clear 
 out the replicated tables in the Slave and start again (i.e truncate the 
 replicated tables ONLY ).

 The problem I have is that only SOME tables are replicated between the Master 
 and the Slave.

 - i do NOT want the replication to destroy the un-replicated tables on the 
 Slave.

 I have seen all the instructions on how to start replication but most of them 
 have the step of drop the Slave db and re-import the schema from the Master. 
 I can't do this as I don't want to clobber my un-replicated tables.

 Is it possible to do what I want?

Certainly.


 My initial thoughts were to do an slonik unsubscribe set and then do a 
 slonik subscribe set

 The problem I have is that the slon daemon on the Slave is erroring (see 
 below), and from what I understand the unsubscribe set event will just be 
 put in the event queue. I.e. The Slave will never be able to unsubscribe 
 because it is erroring on an earlier event.

 I was thinking there might be a way to delete all events in the queue so it 
 could just continue ... is this possible? I realise that after that the data 
 is not in sync, but im planning on blowing away the replicated tables and 
 starting again anyway...

 Although there might be a way to fix replication, i kind of want to know how 
 to just do a clean replication from this point, as I see it as an important 
 task.

 Thanks for any thoughts/opinions.

 Cheers

 Jon

 



 Error in Slave log is:
 2013-03-06 19:56:41 HKTINFO   remoteWorkerThread_1: syncing set 1 with 47 
 table(s) from provider 1
 2013-03-06 19:56:42 HKTERROR  remoteWorkerThread_1: select 
 _replication.sequenceSetValue(10,1,'5002597503','291048');  
 PGRES_FATAL_ERROR ERROR:  Slony-I: sequenceSetValue(): sequence 10 not found

You did not tell us which version of Slony you are running. That is 
always good to include in a problem report. In this case it is 
especially relevant since the above is probably the result from doing 
some DDL on the replica (slave) directly, without sending it through 
slonik's EXECUTE SCRIPT functionality. My bet is that someone dropped 
and recreated the sequence in question. You can find out what sequence 
that is by doing

 SELECT * FROM _replication.sl_sequence where seq_id = 10;

If you are still running Slony 1.2 then it is important to know what 
other DDL may have been executed on the replica since doing that in 1.2 
can lead to a permanently corrupted system catalog. There is no danger 
of that in 2.0 or 2.1.

Operations harmful on replicated tables on a replica under 1.2 include 
things like create/drop index, add/drop constraint, create/drop trigger 
as well as dropping columns.

If you conclude that your system catalog must still be consistent 
because you are running 2.0 or higher or none of the above has happened, 
then you can try to fix the problem first by running the slonik command 
REPAIR CONFIG against the replica node.

 http://slony.info/documentation/2.1/stmtrepairconfig.html

If that does not fix the problem, then you can uninstall Slony using the 
UNINSTALL NODE command for all nodes.

 http://slony.info/documentation/1.2/stmtuninstallnode.html

This does not touch any data on any of the nodes. It only restores 
things so that both systems at that point are stand alone databases.

After that you simply rebuild your replication cluster like it was done 
originally. Creating nodes, paths, sets, tables and subscribing them.

You do not need to truncate anything on the replica. Slony will take 
care of that.

If you have to go through that it would be a good opportunity to upgrade 
to Slony 2.1 while at it.


Regards,
Jan

-- 
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin
___
Slony1-general mailing list
Slony1-general@lists.slony.info
http://lists.slony.info/mailman/listinfo/slony1-general


Re: [Slony1-general] Help - replication has broken, best way to restart it without destroying data.

2013-03-06 Thread Jan Wieck
On 3/6/2013 3:24 PM, Jonathan Soong wrote:
 Hi Jan

 Sorry for not including version, yup im 2.0.7 and pg 8.4.11.

 That's good, at least I know no data has been corrupted.

Indeed, that is a good thing.


 As you say, I think DDL was run against the Slave by itself and i think it is 
 corrupted. I don't think i can repair it because things were done against the 
 Slave since then.

Well, have you tried to run the REPAIR CONFIG slonik command for the set 
against the slave? The damage may not be as severe as you think.

What is the actual name of that sequence 10 according to sl_sequence? 
Does that sequence exist in the slave database?



 I am presuming the UNINSTALL NODE will bypass the slony event queue (which is 
 blocked up with an error) ?

UNINSTALL NODE is a command that actually removes all traces of Slony 
from a database. There should not be a slon daemon running at that time 
and you are left with a standalone database, that has no idea what Slony 
is at all.

Both databases would still have all your tables with whatever data is 
currently in them.

So you would stop the slon daemons, run UNINSTALL NODE against both, 
master and slave. Then rebuild your replication from scratch, just 
assuming that you don't have to copy any schema. The tables already 
exist on the slave, so there is not need to do that.


Jan

-- 
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin
___
Slony1-general mailing list
Slony1-general@lists.slony.info
http://lists.slony.info/mailman/listinfo/slony1-general


Re: [Slony1-general] Help - replication has broken, best way to restart it without destroying data.

2013-03-06 Thread Jan Wieck
On 3/6/2013 7:10 PM, Jonathan Soong wrote:
 Thanks Jan, all fixed.

 :)

Most glad that it all turned out well.

Just for the record, did the REPAIR CONFIG do the trick or did you have 
to rebuild the cluster?


Jan


 Cheers

 Jon

 -Original Message-
 From: Jan Wieck [mailto:janwi...@yahoo.com]
 Sent: Thursday, 7 March 2013 7:11 AM
 To: Jonathan Soong
 Cc: slony1-general@lists.slony.info
 Subject: Re: [Slony1-general] Help - replication has broken, best way to 
 restart it without destroying data.

 On 3/6/2013 3:24 PM, Jonathan Soong wrote:
 Hi Jan

 Sorry for not including version, yup im 2.0.7 and pg 8.4.11.

 That's good, at least I know no data has been corrupted.

 Indeed, that is a good thing.


 As you say, I think DDL was run against the Slave by itself and i think it 
 is corrupted. I don't think i can repair it because things were done against 
 the Slave since then.

 Well, have you tried to run the REPAIR CONFIG slonik command for the set 
 against the slave? The damage may not be as severe as you think.

 What is the actual name of that sequence 10 according to sl_sequence?
 Does that sequence exist in the slave database?



 I am presuming the UNINSTALL NODE will bypass the slony event queue (which 
 is blocked up with an error) ?

 UNINSTALL NODE is a command that actually removes all traces of Slony from a 
 database. There should not be a slon daemon running at that time and you are 
 left with a standalone database, that has no idea what Slony is at all.

 Both databases would still have all your tables with whatever data is 
 currently in them.

 So you would stop the slon daemons, run UNINSTALL NODE against both, master 
 and slave. Then rebuild your replication from scratch, just assuming that you 
 don't have to copy any schema. The tables already exist on the slave, so 
 there is not need to do that.


 Jan

 --
 Anyone who trades liberty for security deserves neither liberty nor security. 
 -- Benjamin Franklin



-- 
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin
___
Slony1-general mailing list
Slony1-general@lists.slony.info
http://lists.slony.info/mailman/listinfo/slony1-general


Re: [Slony1-general] unsubscribe/drop set failing

2013-01-09 Thread Jan Wieck
On 12/20/2012 8:59 AM, Vick Khera wrote:
 On Thu, Dec 20, 2012 at 7:36 AM, Matthew Vernon
 matthew.ver...@sruc.ac.uk mailto:matthew.ver...@sruc.ac.uk wrote:

 How can I sort this out?


 easy way: uninstall slony and rebuild your cluster.

 hard way: no idea. i never succeeded in fixing a schema I broke this way. :(

The ward way would be to DELETE all tables and sequences form sl_table 
and sl_sequence, who's taboid and seqoid don't appear in pg_class.


Jan

-- 
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin
___
Slony1-general mailing list
Slony1-general@lists.slony.info
http://lists.slony.info/mailman/listinfo/slony1-general


Re: [Slony1-general] data copy for set 1 failed 3 times - sleep 60 seconds

2012-12-10 Thread Jan Wieck
On 12/10/2012 5:28 PM, Tory M Blue wrote:

 I'm back to it with debug of 4 on the source and the destination nodes.

 Still failing

 Destination:

 Slon log:

 2012-12-10 12:04:49 PST CONFIG remoteWorkerThread_1: 6525.754 seconds to
 copy table cls.listings
 2012-12-10 12:04:49 PST CONFIG remoteWorkerThread_1: copy table
 cls.customers
 2012-12-10 12:04:49 PST CONFIG remoteWorkerThread_1: Begin COPY of table
 cls.customers
 2012-12-10 12:04:49 PST ERROR  remoteWorkerThread_1: select
 _admissioncls.copyFields(8);
 2012-12-10 12:04:49 PST WARN   remoteWorkerThread_1: data copy for set 1
 failed 1 times - sleep 15 seconds
 2012-12-10 12:04:51 PST INFO   cleanupThread: 5961.364 seconds for
 cleanupEvent()
 2012-12-10 12:05:06 PST INFO   copy_set 1 - omit=f - bool=0
 2012-12-10 12:05:06 PST INFO   omit is FALSE
 2012-12-10 12:05:06 PST CONFIG version for dbname=clsdb host=server
 user=postgres password=SECURED is 90104
 2012-12-10 12:05:07 PST DEBUG1 copy_set_1 dbname=clsdb host=server
 user=postgres password=SECURED: backend pid = 17092
 2012-12-10 12:05:07 PST CONFIG remoteWorkerThread_1: connected to
 provider DB

 Postgres logs:

 2012-12-10 12:04:51 PST admissionclsdb postgres [local] NOTICE:
 Slony-I: Logswitch to sl_log_2 initiated
 2012-12-10 12:04:51 PST admissionclsdb postgres [local] CONTEXT:  SQL
 statement SELECT _admissioncls.logswitch_start()
 2012-12-10 12:05:12 PST admissionclsdb postgres [local] LOG:  sending
 cancel to blocking autovacuum PID 18620
 2012-12-10 12:05:12 PST admissionclsdb postgres [local] DETAIL:  Process
 18299 waits for AccessExclusiveLock on relation 17410 of database 16385.
 2012-12-10 12:05:12 PST admissionclsdb postgres [local] STATEMENT:  lock
 table cls.listings;
 2012-12-10 12:05:12 PSTERROR:  canceling autovacuum task
 2012-12-10 12:05:12 PSTCONTEXT:  automatic vacuum of table
 admissionclsdb.cls.listings
 2012-12-10 12:05:12 PST admissionclsdb postgres [local] NOTICE:
 truncate of cls.autobodystyle succeeded

What is the output of

 select  _admissioncls.copyFields(8);

on node 1? It is important to do this on node 1 because that is where 
the remote worker, processing the copy_set, is doing this.




 Source:

 There is nothing in the slon log that would tell me it's aware of this
 client side restart or failure. Nothing in the postgres logs about an
 EOF or anything.
 Nothing in postgres log either that says there is an issue.

Please specify which postgres log you are talking about. There are at 
least two in this case, the one on the new node and the one on the data 
provider (node 1). The one from the data provider is where to expect any 
error messages.


Jan

-- 
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin
___
Slony1-general mailing list
Slony1-general@lists.slony.info
http://lists.slony.info/mailman/listinfo/slony1-general


Re: [Slony1-general] data copy for set 1 failed 3 times - sleep 60 seconds

2012-12-10 Thread Jan Wieck

On 12/10/2012 7:09 PM, Tory M Blue wrote:



On Mon, Dec 10, 2012 at 3:59 PM, Jan Wieck janwi...@yahoo.com
mailto:janwi...@yahoo.com wrote:

On 12/10/2012 6:26 PM, Tory M Blue wrote:



On Mon, Dec 10, 2012 at 2:55 PM, Jan Wieck janwi...@yahoo.com
mailto:janwi...@yahoo.com
mailto:janwi...@yahoo.com mailto:janwi...@yahoo.com wrote:

 On 12/10/2012 5:28 PM, Tory M Blue wrote:


 I'm back to it with debug of 4 on the source and the
destination
 nodes.

 Still failing

 Destination:

 Slon log:

 2012-12-10 12:04:49 PST CONFIG remoteWorkerThread_1:
6525.754
 seconds to
 copy table cls.listings
 2012-12-10 12:04:49 PST CONFIG remoteWorkerThread_1:
copy table
 cls.customers
 2012-12-10 12:04:49 PST CONFIG remoteWorkerThread_1:
Begin COPY
 of table
 cls.customers
 2012-12-10 12:04:49 PST ERROR  remoteWorkerThread_1:
select
 _admissioncls.copyFields(8);


This is the bit that I don't understand.

Looking at the remote_worker.c code, this error message comes from
line 3032 in 2.1 STABLE. The same code is in line 2911 in HEAD.

That select of copyFields() has apparently failed, because the
result status code is not PGRES_TUPLES_OK, yet there seems to be
nothing returned from PQresultErrorMessage().

I wonder what PQresStatus(PQresultStatus(__res3)) as well as
PQerrorMessage(pro_dbconn) at this point have to tell.


Thanks for not giving up on me
Tory


What version of Slony are you running this on?


-bash-4.1$ slon -v
slon version 2.1.1


Can you run a custom patched slon on the new subscriber so that we
can see that information in the log?


Absolutely!


Attached is a patch for 2.1.1 that adds that information to the error 
message. Since slon remains the same version, you only have to replace 
the slon binary on the new subscriber (node 10).



Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin
diff --git a/src/slon/remote_worker.c b/src/slon/remote_worker.c
index 7fbf67c..3497008 100644
*** a/src/slon/remote_worker.c
--- b/src/slon/remote_worker.c
*** copy_set(SlonNode *node, SlonConn *local
*** 3029,3037 
  
  		if (PQresultStatus(res3) != PGRES_TUPLES_OK)
  		{
! 			slon_log(SLON_ERROR, remoteWorkerThread_%d: \%s\ %s\n,
  	 node-no_id, dstring_data(query2),
! 	 PQresultErrorMessage(res3));
  			PQclear(res3);
  			PQclear(res1);
  			slon_disconnectdb(pro_conn);
--- 3029,3039 
  
  		if (PQresultStatus(res3) != PGRES_TUPLES_OK)
  		{
! 			slon_log(SLON_ERROR, remoteWorkerThread_%d: \%s\ %s %s - %s\n,
  	 node-no_id, dstring_data(query2),
! 	 PQresStatus(PQresultStatus(res3)),
! 	 PQresultErrorMessage(res3),
! 	 PQerrorMessage(pro_dbconn));
  			PQclear(res3);
  			PQclear(res1);
  			slon_disconnectdb(pro_conn);
___
Slony1-general mailing list
Slony1-general@lists.slony.info
http://lists.slony.info/mailman/listinfo/slony1-general


Re: [Slony1-general] Multiple sets w/same table

2012-12-09 Thread Jan Wieck
On 12/6/2012 6:06 PM, Steve Singer wrote:
 On 12-12-04 11:50 PM, David Rees wrote:
 On Tue, Dec 4, 2012 at 7:34 PM, Steve Singer st...@ssinger.info wrote:



 For example, table 1 is in set 1 and table 2 is in set 2 and table 1
 has a foreign key that refers to table 2.

 Changes that occur on the origin node 1 which will be the same for set
 1/2 will obviously be fine, but is it possible for an event to be
 committed out-of-order on node 2?


 If you subscribe  set 2 first, then set 1 and if both set 1 and your not
 using a different forwarder node, then I think you should be okay.

The subscriptions should work either way since it is done under 
session_replication_role=replica, which suppresses foreign key checks.


Jan

-- 
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin
___
Slony1-general mailing list
Slony1-general@lists.slony.info
http://lists.slony.info/mailman/listinfo/slony1-general


Re: [Slony1-general] Slony version 2.1.0 switchover

2012-12-03 Thread Jan Wieck
On 11/30/2012 11:04 AM, Steve Singer wrote:
 On 12-11-27 11:59 AM, Frank McGeough wrote:
 I'm writing to ask about using Slony switchover capability. I've never
 actually been able to make this work. I'm curious about whether this is
 a common experience or if, perhaps, there is something that I don't
 understand about how this should function. The environment that I have
 is a database that is in constant use. That is, there is always at least
 a low level of DML (inserts for the most part) occurring 24 x 7, 365
 days a year. I've found that Slony will never be able to get a lock in
 order to perform the switchover and thus I'm left with tearing

 The slony MOVE SET operation needs to have an exclusive lock on all
 tables in the set at the same time.   The recommended way of doing this
 is to have your application stop traffic to the database for long enough
 so that this can be done, and then wait for the event to propogate to
 the new master and then have your application start queries against the
 new master.

 If slony gets its locks on a move set in this order
 Table1
 Table2
 Table3

 but your application access tables in this order
 Table3
 Table2
 Table1

 Your application and slony will deadlock each other.

What Steve said.

Also, if your application is compatible with pgbouncer's transaction 
mode (i.e. not using server side prepared statements), then you can use 
that to PAUSE and RESUME the real database connections without even 
stopping the application. This can be completely scripted and pgbouncer 
can even do the redirection of traffic on RESUME.


Jan

-- 
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin
___
Slony1-general mailing list
Slony1-general@lists.slony.info
http://lists.slony.info/mailman/listinfo/slony1-general


Re: [Slony1-general] data copy for set 1 failed 3 times - sleep 60 seconds

2012-11-30 Thread Jan Wieck
On 11/29/2012 4:39 PM, Tory M Blue wrote:

 Nothing, the postgres logs are clean, other than the checkpoint
 warnings, nothing, really absolutely nothing. This appears to be slony
 centric.

Are these the Postgres logs on the new node or the data provider?


Jan

-- 
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin
___
Slony1-general mailing list
Slony1-general@lists.slony.info
http://lists.slony.info/mailman/listinfo/slony1-general


Re: [Slony1-general] replication fails to due bad DNS at startup, requires slony restart to fix

2012-11-15 Thread Jan Wieck



On 11/13/2012 2:06 PM, Mark Stosberg wrote:


 On 11/12/2012 04:15 PM, Andrew Sullivan wrote:
 On Mon, Nov 12, 2012 at 04:05:37PM -0500, Christopher Browne wrote:
 Is it possible that your libc is doing cacheing?  If *that's* the
 case, and I'm suspicious of it, then you really would need to restart
 the slon to get unstuck from what libc has cached on you...

 More likely, the system resolver is pointing at a caching name server.

 Thanks for prompt answers, everyone.

 I'll pursue this as an issue with our use of Ubuntu on Amazon/EC2.

I don't think that restarting the slon manually is the right solution to 
this problem.

Whatever really causes it (libc probably) doesn't matter. Changing DNS 
entries to redirect services to a different IP address is often used in 
disaster recovery procedures (failover). We should not require an 
outside process or worse an admin to restart things in a case like that.

I don't know what exactly is the right answer. From behind libpq it is 
rather difficult to tell what exactly went wrong when trying to connect 
to a database. A generic answer to that would be to do only X attempts 
to (re)connect, then restart the entire slon process. But that has 
pitfalls. If someone has shut down a replica for maintenance purposes, 
all slons would constantly restart in that interval, which would prevent 
longer running tasks like subscriptions. So this would have to go hand 
in hand with some new configuration option(s), like ignore connection 
failures to this node WRT restarting.

Some input from the user community would certainly help.


Jan

-- 
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin


___
Slony1-general mailing list
Slony1-general@lists.slony.info
http://lists.slony.info/mailman/listinfo/slony1-general


Re: [Slony1-general] pg_dump, slony databases, and locking

2012-10-24 Thread Jan Wieck
On 10/24/2012 8:15 PM, Joe Conway wrote:
 On 10/24/2012 04:58 PM, Christopher Browne wrote:
 Arguably, the failure to restore the triggers isn't a real error, if
 you squint at things in a particular way.

 Alternatively, I suppose you could do a separate dump of *just* the
 Slony schema.  That has *some* risk of locking conflict, but it should
 be an entirely smaller risk, as it's not encompassing the likely-large
 backup of your application's schema and data.  Dump the Slony schema,
 and load it first, and that would allow the
 schema-excluding-Slony-bits to work.

 But my preference would be to squint at things sideways and say, It's
 OK that those triggers didn't load in.

 Thanks for the response (and Steve's as well). We are scripting the
 restore and therefore prefer to treat all errors as a failure.
 Unfortunately if we allow the restore trigger errors it will make it
 difficult to disallow other errors. But I guess it is something to consider.

Not that this will be of any help for you, but I consider it a missing 
feature of pg_dump that it allows to omit a schema, but not to omit any 
references to that schema like triggers, foreign keys and so on.


Jan

-- 
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin
___
Slony1-general mailing list
Slony1-general@lists.slony.info
http://lists.slony.info/mailman/listinfo/slony1-general


Re: [Slony1-general] Issue when adding node to replication

2012-10-01 Thread Jan Wieck
On 9/28/2012 2:40 PM, Brian Fehrle wrote:
 I'm going to go ahead and delete the offending row in sl_event for the
 node that doesn't exist. This is a production environment so I need to
 get up and running again.

Deleting a SYNC event of a non-origin node can never have any negative 
side effects. They are noise that keeps sl_status looking good, but 
that's really all they do.

SYNC events from a set origin are a little different. A SYNC immediately 
before an EXECUTE_SCRIPT event prior to 2.2 may be important. Other than 
that, they don't matter that much either.

When a node has fallen behind, it deliberately skips a number of SYNC 
events, a feature we call sync-grouping, to catch up in larger leaps. 
There is no difference at all between slon grouping SYNC events and some 
SYNC event rows missing in sl_event. Absolutely zero difference.

The difference for the SYNC event from a set origin in front of an 
EXECUTE_SCRIPT is because that DDL may add a column to a table. That 
SYNC event was supposed to make sure that the column exists on a 
subscriber when the first data containing that column appeared in 
sl_log_N. That all changed in 2.2 as well, where DDL from script 
execution now travels within sl_log_N too, so as of 2.2 you can simply 
delete all SYNC events from sl_event if you feel like. Slon will 
generate another SYNC soon and the processing of that will just leap the 
receivers to that state.


Jan

-- 
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin
___
Slony1-general mailing list
Slony1-general@lists.slony.info
http://lists.slony.info/mailman/listinfo/slony1-general


Re: [Slony1-general] Issue when adding node to replication

2012-09-28 Thread Jan Wieck
On 9/28/2012 9:15 AM, Steve Singer wrote:
 * we can't ignore SYNC events even though the dropped
 * node is not an origin it might have been an old
 * origin before a FAILOVER. Some behind node still
 * might need to get caught up from its provider.

That assessment is actually wrong. Any data provider as well as the 
origin will keep the data until all subscribers have confirmed it. 
Simply changing the data provider of the behind node will be enough to 
make it catch up from another provider.


Jan

-- 
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin
___
Slony1-general mailing list
Slony1-general@lists.slony.info
http://lists.slony.info/mailman/listinfo/slony1-general


Re: [Slony1-general] Issue when adding node to replication

2012-09-27 Thread Jan Wieck
On 9/27/2012 2:34 PM, Brian Fehrle wrote:
 Hi all,

 PostgreSQL v 9.1.5 - 9.1.6
 Slony version 2.1.0

 I'm having an issue that's occurred twice now. I have 4 node slony
 cluster, and one of the operations is to drop a node from replication,
 do maintenance on it, then add it back to replication.

 Node 1 = master
 Node 2 = slave
 Node 3 = slave  - dropped then readded
 Node 4 = slave

First, why is the node actually dropped and readded so fast, instead of 
just doing the maintenance while it falls behind, then let it catch up?

You apparently have a full blown path network from everyone to everyone. 
This is not good under normal circumstances since the automatic listen 
generation will cause every node to listen on every other node for 
events, from non-origins. Way too many useless database connections.

What seems to happen here are some race conditions. The node is dropped 
and when it is added back again, some third node still didn't process 
the DROP NODE and when node 4 looks for events from node 3, it finds old 
ones somewhere else (like on 1 or 2). When node 3 then comes around to 
use those event IDs again, you get the dupkey error.

What you could do if you really need to drop/readd it, use an explicit 
WAIT FOR EVENT for the DROP NODE to make sure all traces of that node 
are gone from the whole cluster.


Jan

-- 
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin
___
Slony1-general mailing list
Slony1-general@lists.slony.info
http://lists.slony.info/mailman/listinfo/slony1-general


Re: [Slony1-general] Issue when adding node to replication

2012-09-27 Thread Jan Wieck
On 9/27/2012 3:58 PM, Brian Fehrle wrote:
 Follow up:

 I executed this on the master:
 mydatabase=# select * from _slony.sl_event where ev_origin not in
 (select no_id from _slony.sl_node);
   ev_origin |  ev_seqno  | ev_timestamp  |ev_snapshot |
 ev_type | ev_data1 | ev_data2 | ev_data3 | ev_data4 | ev_data5 |
 ev_data6 | ev_data7 | ev_data8
 ---++---++-+--+--+--+--+--+--+--+--
   3 | 5000290161 | 2012-09-27 09:48:03.749424-04 |
 40580084:40580084: | SYNC|  |  | |
 |  |  |  |
 (1 row)

 There is a row in sl_event that shouldn't be there, because it's
 referencing a node that nolonger exists. I need to add this node back to
 replication, but I don't want to run into the same issue as before. I
 ran a cleanupEvent('10 minute') and it did nothing (even did it with 0
 minutes).

 Will this row eventually go away? will it cause issue if we attempt to
 add a new node to replication with node = 3? How can I safely clean this up?

Hmmm,

this actually looks like a more severe race condition or even a bug.

The thing is that processing the DROP NODE and replicating the SYNC are 
different worker threads, since the events originate on different nodes. 
Cleaning out the sl_event is part of dropNode_int(). But the 
remoteWorker for 3 may just have inserted that SYNC concurrently and 
therefore it was left behind.

My guess is that the right solution to this is to clean out everything 
again when a STORE NODE comes along. We had been thinking of making the 
node ID non-reusable to prevent this sort of race conditions.


Jan

-- 
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin
___
Slony1-general mailing list
Slony1-general@lists.slony.info
http://lists.slony.info/mailman/listinfo/slony1-general


Re: [Slony1-general] Issue when adding node to replication

2012-09-27 Thread Jan Wieck
On 9/27/2012 5:30 PM, Christopher Browne wrote:
 On Thu, Sep 27, 2012 at 5:26 PM, Jan Wieck janwi...@yahoo.com wrote:
 My guess is that the right solution to this is to clean out everything
 again when a STORE NODE comes along. We had been thinking of making the
 node ID non-reusable to prevent this sort of race conditions.

 I'm not sure I'm totally comfortable with cleaning it all out
 instantly; as a step towards that, I'd think it a good idea for slonik
 to check all the nodes for existence of a node ID, and refuse if it's
 found anywhere.

 Under that circumstance, you might need to wait, to run the STORE
 NODE, until the cleanup thread has run on all the nodes to expunge the
 last bits of the node on all nodes' databases.

 Smells a bit safer to me...


Check cleanupEvent(). I think it will never remove that stale event.


Jan

-- 
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin
___
Slony1-general mailing list
Slony1-general@lists.slony.info
http://lists.slony.info/mailman/listinfo/slony1-general


Re: [Slony1-general] Issue when adding node to replication

2012-09-27 Thread Jan Wieck
On 9/27/2012 6:48 PM, Brian Fehrle wrote:
 On 09/27/2012 03:40 PM, Jan Wieck wrote:
 On 9/27/2012 5:30 PM, Christopher Browne wrote:
 On Thu, Sep 27, 2012 at 5:26 PM, Jan Wieck janwi...@yahoo.com wrote:
 My guess is that the right solution to this is to clean out everything
 again when a STORE NODE comes along. We had been thinking of making the
 node ID non-reusable to prevent this sort of race conditions.

 I'm not sure I'm totally comfortable with cleaning it all out
 instantly; as a step towards that, I'd think it a good idea for slonik
 to check all the nodes for existence of a node ID, and refuse if it's
 found anywhere.

 Under that circumstance, you might need to wait, to run the STORE
 NODE, until the cleanup thread has run on all the nodes to expunge the
 last bits of the node on all nodes' databases.

 Smells a bit safer to me...


 Check cleanupEvent(). I think it will never remove that stale event.

 Yeah, it looks like it will only remove confirmed ones.

 --code from cleanupEvent()-
   -- 
   -- Then remove all events that are confirmed by all nodes in the
   -- whole cluster up to the last SYNC
   -- 
   for v_min_row in select con_origin, min(con_seqno) as con_seqno
   from sl_confirm
   group by con_origin
   loop
   select coalesce(max(ev_seqno), 0) into v_max_sync
   from sl_event
   where ev_origin = v_min_row.con_origin
   and ev_seqno = v_min_row.con_seqno
   and ev_type = 'SYNC';
   if v_max_sync  0 then
   delete from sl_event
   where ev_origin = v_min_row.con_origin
   and ev_seqno  v_max_sync;
   end if;
   end loop;

 the query that hits sl_confirm for the loop returns the following:
con_origin | con_seqn0
 +
 1 | 5000242178
 2 | 5000661718
 4 | 560743

 So it never hits node 3 to do any delets from sl_event on node three.
 This is the only place in cleanupEvent i believe will do any deletes
 from sl_event.

 So should I try to delete this row myself, or would that cause major
 issues also? I'm still wrapping my head around how sl_confirm and
 sl_event work together when adding/removing nodes.

Since there isn't even a node 3 in sl_node, it is safe to delete that row.


Jan

-- 
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin
___
Slony1-general mailing list
Slony1-general@lists.slony.info
http://lists.slony.info/mailman/listinfo/slony1-general


Re: [Slony1-general] Manually kicking off a logswitch

2012-09-19 Thread Jan Wieck
On 9/19/2012 6:46 PM, Brian Fehrle wrote:
 Hi all,

 Postgres 8.4, slony 1.2.21

 Previously I had reached out on an issue where the sl_log_1 or sl_log_2
 table would get so full that replication would come to a crawl, only
 processing one event at a time. It seems as though HUGE data
 insert,updates,or deletes to replicated tables are to cause, and being
 on slony 1.2 there isn't much we can do to get around it.

 The size of the sl_log table was above 9 million rows where we saw this
 as an issue. We are now going along the path of doing much smaller
 groups of updates so we don't get into the same condition as before. We
 just did 1.2 million rows worth of updates and it only took a few
 minutes to replicate it all to the slave. Good news.

 But now our sl_log_1 table is sitting at 1.2 million rows, and we'd like
 to let it be switched and truncated by slony before kicking off a few
 more million rows worth of updates. From what I can tell via
 documentation, this is not all that often.

 So what is the thoughts on manually kicking off the logswitch via
 select _slony.logswitch_start() on the master? I reviewed the code and
 it won't let a switch occure if it's already in progress, so it seems
 it's being pretty safe in its execution. However it looks like all it
 really does is update a sequence to say we're currently switching and
 then slony does it in the background.

The issue itself is caused by a problem with the log select query that 
was fixed in 2.1 (commit d4118d... from Jan 27, 2011).


 So my questions are. 1. is this a safe practice to do? We may be doing
 it multiple times a day (guestimate, ten or more times?). and 2. what is
 slony doing in the background for this to occur? It looks like it
 actually switches to the new log right away, but takes some time before
 the old log is truncated, does it need to wait until a cleanevent can
 run on the data within, aka about 10 minutes?  (#2 is more out of
 curiosity).

slon is calling the stored procedure cleanupEvent(interval). You can 
safely call that with an interval of a few minutes. The interval is how 
old events must be at least to be purged from sl_event. Even a zero 
interval should be safe.

However, Slony is normally trying to do this every 10 minutes (if memory 
serves). With that backlog it is very likely that the previous logswitch 
attempt hasn't finished yet because the backlog is in the old log table.

As said, the problem is fixed in 2.1. If upgrading to Slony 2.1 is not 
an option for you, you may consider backpatching the above commit into 
your 1.2.


Jan

-- 
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin
___
Slony1-general mailing list
Slony1-general@lists.slony.info
http://lists.slony.info/mailman/listinfo/slony1-general


Re: [Slony1-general] Monitoring big table sync

2012-08-22 Thread Jan Wieck
On 8/21/2012 6:16 PM, Christopher Browne wrote:
 The one thing that I'd be a *bit* concerned about is the 57K event
 lag.  At one point, I recall there being a possible troublesome case
 where one of the queries for the first SYNC processing after that
 could time out when trying to read in all of the tens of thousands of
 events.  It would be nicer if, during this laggy period, SYNCs got
 generated somewhat less often, let's say, every 10-20s rather than
 every 0.5s.

That can only happen on slon startup, when the remote_listen thread 
selects all NEW events. During the copy_set(), the remote listener is 
still getting these events and queues them in memory.

Unless something is interrupting the copy, I'm not worried yet.


Jan

-- 
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin
___
Slony1-general mailing list
Slony1-general@lists.slony.info
http://lists.slony.info/mailman/listinfo/slony1-general


Re: [Slony1-general] Monitoring big table sync

2012-08-22 Thread Jan Wieck
On 8/22/2012 11:45 AM, Christopher Browne wrote:
 On Wed, Aug 22, 2012 at 11:34 AM, Jan Wieck janwi...@yahoo.com wrote:
 On 8/21/2012 6:16 PM, Christopher Browne wrote:

 The one thing that I'd be a *bit* concerned about is the 57K event
 lag.  At one point, I recall there being a possible troublesome case
 where one of the queries for the first SYNC processing after that
 could time out when trying to read in all of the tens of thousands of
 events.  It would be nicer if, during this laggy period, SYNCs got
 generated somewhat less often, let's say, every 10-20s rather than
 every 0.5s.


 That can only happen on slon startup, when the remote_listen thread selects
 all NEW events. During the copy_set(), the remote listener is still getting
 these events and queues them in memory.

 Unless something is interrupting the copy, I'm not worried yet.

 Agreed.

 I do recall seeing trouble come up *afterwards*, if there were so many
 events to process that it took too long to run the query to draw them
 in.

 Ah, poking at the code is useful.  There is a configuration option for
 this, remote_listen_timeout, which defaults to 300 seconds, but might
 be set higher.

 If, after the subscription completes, there are wildly too many SYNC
 events lingering around, and it takes too long to process them, the
 remote_listen thread will time out, complaining thus:

You are missing the point. However many unprocessed SYNC events there 
will be after the copy, the remote_listener will have shoveled them into 
memory as they appeared already. There is no BIG EVENT SELECT after the 
copy.



-- 
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin
___
Slony1-general mailing list
Slony1-general@lists.slony.info
http://lists.slony.info/mailman/listinfo/slony1-general


Re: [Slony1-general] slonik_execute_script not working

2012-07-05 Thread Jan Wieck
On 7/4/2012 3:39 PM, Vick Khera wrote:
 On Mon, Jul 2, 2012 at 3:29 PM, Wolf Schwurack w...@uen.org
 mailto:w...@uen.org wrote:

 You can not run DDL statement using slony, it states that in slony docs


 Where? That's the only safe way to perform DDL on a replicated table.

In 1.x yes. Since 2.0 DDL can be done directly with psql or by other 
means. If doing so, one needs to be careful which modifications are done 
to which nodes in which order so that inserts/updates aren't coming into 
a node with columns or things like that. But Slony by itself won't stand 
in the way any more.


Jan

-- 
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin


___
Slony1-general mailing list
Slony1-general@lists.slony.info
http://lists.slony.info/mailman/listinfo/slony1-general


Re: [Slony1-general] How to perform updates on a table slave

2012-06-28 Thread Jan Wieck
On 6/19/2012 12:43 PM, Efraín Déctor wrote:
 Hello.
 We want to use Slony to keep our 2 servers replicated, however several
 tables are being accesed by Java processes that perform updates on them,
 Slony doesn’t this allow to happen. Is there a way to allow this?.
 Thank you.

Several ways.

1) The proper way to do this would be to those fields, that need to be 
updated on the replica, into separate tables that are not replicated and 
local to the replica.

2) You could move the code, that modifies those rows, into a BEFORE 
trigger and configure it on the replica so that it fires during replication.

3) You could disable the deny_access triggers on those tables.


Jan

-- 
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin


___
Slony1-general mailing list
Slony1-general@lists.slony.info
http://lists.slony.info/mailman/listinfo/slony1-general


  1   2   3   >