[Slony1-general] remote listener serializability

2015-11-16 Thread Tignor, Tom
Hello slony1 community, I'm part of a team at Akamai working on a notification service based on postgres. (We call it an Alert Management System.) We're at the point where we need to scale past the single instance DB and so have been working with slony1-2.2.4 (and postgresql-9.1.18) to make

Re: [Slony1-general] remote listener serializability

2015-12-10 Thread Tignor, Tom
cc list, so maybe interested folks can comment through Bugzilla. Thanks, Tom:-) On 12/4/15, 10:11 PM, "Greg Sabino Mullane" <g...@endpoint.com> wrote: >On Thu, Nov 19, 2015 at 06:09:54PM +, Tignor, Tom wrote: >> Thanks for the feedbac

Re: [Slony1-general] remote listener serializability

2015-11-18 Thread Tignor, Tom
. Thanks in advance for your time and consideration. Tom:-) On 11/16/15, 1:28 PM, "Steve Singer" <ssin...@ca.afilias.info> wrote: >On 11/16/2015 08:52 AM, Tignor, Tom wrote: >> >> Hello slony1 community, >> I¹m part of a team at Akamai working on a noti

Re: [Slony1-general] remote listener serializability

2015-11-20 Thread Tignor, Tom
uot; <ssin...@ca.afilias.info> wrote: >On 11/19/2015 01:09 PM, Tignor, Tom wrote: >> A general question for the group: if we would consider a change like >>this >> (as a runtime option or otherwise), what’s the correct way to move it >> forward? Should I file

Re: [Slony1-general] remote listener serializability

2015-11-19 Thread Tignor, Tom
Greg, Andrew, Thanks for the feedback. Greg, can you describe the transaction handling changes you’re referring to? I recently got the latest pg 9.4 distribution. The README-SSI is identical and while there have been some changes in predicate.c, they don’t appear sweeping. The doc

Re: [Slony1-general] Cannot fully drop slony node

2016-02-04 Thread Tignor, Tom
If I'm reading right, did you run the drop node op at some point on node 1 and see it succeed? If it did, the sl_node table on each other node in the cluster (save perhaps node 3) should show it gone. If that's the case, your cluster is fine and you can just run 'DROP SCHEMA mycluster CASCADE'

Re: [Slony1-general] Cannot fully drop slony node

2016-02-04 Thread Tignor, Tom
Feb 4, 2016 at 9:48 AM, Sung Hsin Lei <sungh@gmail.com<mailto:sungh@gmail.com>> wrote: yes... that's it!! On Thu, Feb 4, 2016 at 8:58 AM, Tignor, Tom <ttig...@akamai.com<mailto:ttig...@akamai.com>> wrote: If I'm reading right, did you run the drop node op at some poi

Re: [Slony1-general] Is slony.info down?

2016-02-11 Thread Tignor, Tom
I also noticed it down yesterday morning. Tom:-) On 2/10/16, 8:58 PM, "Steve Singer" wrote: >On Tue, 9 Feb 2016, Asad Shah wrote: > >> I cannot get to the slony site right now? >> Is it down? > >Working for me right now, but it could have been down. >

[Slony1-general] Slony-I: log switch to sl_log_2 still in progress - sl_log_1 not truncated

2016-01-28 Thread Tignor, Tom
Hello slony folks, >From my reading I’m guessing (hoping) this isn’t a new problem. I have a >simple cluster with one provider replicating to three subscribers. The >provider’s changelog tables (sl_log_[1|2]) are fine, but the subscribers (with >forwarding enabled) are all showing runaway

Re: [Slony1-general] Slony-I: log switch to sl_log_2 still in progress - sl_log_1 not truncated

2016-02-02 Thread Tignor, Tom
Thanks Jan. Nothing in evidence now, but I can check for that in the future. Tom:-) On 2/2/16, 9:25 AM, "Jan Wieck" <j...@wi3ck.info> wrote: >On 02/02/2016 08:06 AM, Tignor, Tom wrote: >> >> I did drop one of my replicas several weeks

Re: [Slony1-general] Slony-I: log switch to sl_log_2 still in progress - sl_log_1 not truncated

2016-01-28 Thread Tignor, Tom
| 5000637482 | 2016-01-28 16:07:22.926411+00 | 2016-01-28 16:07:22.695826+00 | 1 | 00:00:19.077657 (3 rows) Tom:-) On 1/28/16, 10:38 AM, "Jan Wieck" <j...@wi3ck.info> wrote: >On 01/28/2016 08:30 AM, Tignor, Tom wrote: >> >> Hello slony

Re: [Slony1-general] Slony-I: log switch to sl_log_2 still in progress - sl_log_1 not truncated

2016-02-02 Thread Tignor, Tom
a provider for anybody? Tom:-) On 2/2/16, 12:06 AM, "Jan Wieck" <j...@wi3ck.info> wrote: >On 02/01/2016 01:24 PM, Tignor, Tom wrote: >> >> Quick update: a couple hours after deleting entries from both sl_log >> tables with txids > 630M,

Re: [Slony1-general] drop node error

2016-07-18 Thread Tignor, Tom
op, but given what you’ve said, it looks like adding a wait is in order. Tom☺ On 7/18/16, 9:13 AM, "Steve Singer" <st...@ssinger.info> wrote: >On Mon, 18 Jul 2016, Tignor, Tom wrote: > >One thing that I stress is that it is a good idea (and maybe very impor

Re: [Slony1-general] drop node error

2016-07-18 Thread Tignor, Tom
with ‘kill –CONT’ to node A.) Tom☺ On 7/17/16, 2:16 PM, "Steve Singer" <st...@ssinger.info> wrote: >On 07/12/2016 08:23 AM, Steve Singer wrote: >> On 07/08/2016 03:27 PM, Tignor, Tom wrote: >>> Hello slony group, >>> >>>

[Slony1-general] cannot safely MOVE SET, DROP NODE

2016-06-28 Thread Tignor, Tom
Hello slony community, I’m working now on some slony1 failover automation (slony1-2.2.4) and I’m having a lot of trouble getting slony1 to honor MOVE SET commands. Below are the commands I’m using, to my mind pretty simple instructions to move a set and confirm

[Slony1-general] drop node error

2016-07-08 Thread Tignor, Tom
Hello slony group, I’m testing now with slony1-2.2.4. I have just recently produced an error which effectively stops slon processing on some node A due to some node B being dropped. The event reproduces only infrequently. As some will know, a slon daemon for a

Re: [Slony1-general] cannot safely MOVE SET, DROP NODE

2016-06-29 Thread Tignor, Tom
Yes, so would I. ☺ But the script did finish and apparently didn’t wait long enough. In fact, this seems to happen every time. Any thoughts on how to debug? Tom☺ On 6/28/16, 7:55 PM, "Steve Singer" <st...@ssinger.info> wrote: >On Tue, 28 Jun 2016

Re: [Slony1-general] sync performance

2016-09-12 Thread Tignor, Tom
Seems I have an additional data point: the sync test always takes longer (> 20 secs) if I include conninfo for all cluster nodes instead of just the local node. I had previously thought conninfo data was only used when needed. Is this not the case? Tom☺

[Slony1-general] Controlled Switchover

2016-09-12 Thread Tignor, Tom
Hello slony1 users, I’m looking at “controlled switchover.” While there are some varying accounts online, I see the 2.2.4 and 2.2.5 doc both describe these ops as the method to reliably pass the origin role from one node to another. We have recently seen some

Re: [Slony1-general] sync performance

2016-09-13 Thread Tignor, Tom
onfirmed on node 5 /tmp/commcheck-2.slk:55: 2016-09-13 13:52:38 root@prodrpl-Amst:~# Tom☺ On 9/12/16, 4:38 PM, "Steve Singer" <st...@ssinger.info> wrote: On 09/12/2016 11:39 AM, Tignor, Tom wrote: > Seems I have an additional data point: th

[Slony1-general] sync performance

2016-09-12 Thread Tignor, Tom
Hello slony1 community, We’ve recently been testing communication reliability between our cluster nodes. Our config is a simple setup with one provider producing a modest volume of changes (measured in KB/s) consumed by 5 direct subscribers, though these are

[Slony1-general] switchover

2016-09-23 Thread Tignor, Tom
Hello slony folks, I think this should be familiar territory for some: the doc has this example for performing controlled switchover. A small slonik script executes the following commands: lock set (id = 1, origin = 1); wait for event (origin = 1, confirmed =

Re: [Slony1-general] Slony 2.2.6 release plans

2017-07-31 Thread Tignor, Tom
Hi Steve, A question on one item: - Fix some failover issues when doing a multi-node failover with a cascaded node. In cascaded node failover, is it necessary to sync with every receiver node for a failed over set? Or is it sufficient to sync only with

Re: [Slony1-general] Slony 2.2.6 release plans

2017-08-04 Thread Tignor, Tom
@ssinger.info> wrote: On Thu, 3 Aug 2017, Tignor, Tom wrote: > > Thanks Steve. I should mention, the dependence on indirect subscribers for a successful failover may provide a scalability limitation for us. We’re required to complete failover reliably in just

[Slony1-general] more missing paths, and

2017-07-21 Thread Tignor, Tom
Hello again, Slony-I community, After our last missing path issue, we’ve taken a new interest in keeping all our path/conninfo data up to date. We have a cluster running with 7 nodes. Each has conninfo to all the others, so we expect N=7; N*(N-1) = 42 paths.

Re: [Slony1-general] more missing paths, and

2017-07-24 Thread Tignor, Tom
:22.496843+00 4 |8 | 571346 | 2017-07-19 20:27:27.9303+00 4 |1 | 571346 | 2017-07-19 20:27:26.705526+00 4 |7 | 571346 | 2017-07-20 18:04:01.978874+00 (6 rows) ams=# Tom( On 7/22/17, 10:39 AM, "Tignor

Re: [Slony1-general] more missing paths, and

2017-07-22 Thread Tignor, Tom
a guess: is there possibly some sl_event table entry which, if deleted, will allow the node-4-client store path ops to get processed? Tom( On 7/21/17, 9:53 PM, "Steve Singer" <st...@ssinger.info> wrote: On Fri, 21 Jul 2017, T

Re: [Slony1-general] failover failure and mysterious missing paths

2017-07-06 Thread Tignor, Tom
st...@ssinger.info> wrote: On Wed, 5 Jul 2017, Tignor, Tom wrote: > > Interesting. Of course the behavior evident on inspection indicated something like this must be happening. > It seems the doc could be improved on the subject of required paths. I recall som

Re: [Slony1-general] failover failure and mysterious missing paths

2017-06-28 Thread Tignor, Tom
. If the path is missing, how does slon continue to get SYNC events? Tom( On 6/27/17, 5:04 PM, "Steve Singer" <st...@ssinger.info> wrote: On 06/27/2017 11:59 AM, Tignor, Tom wrote: The disableNode() in the makes it look like someone

Re: [Slony1-general] failover failure and mysterious missing paths

2017-07-05 Thread Tignor, Tom
" <st...@ssinger.info> wrote: On Wed, 28 Jun 2017, Tignor, Tom wrote: > > Hi Steve, > Thanks for the info. I was able to repro this problem in testing and saw as soon as I added the missing path back the still-in-process failover op continued on and completed su

Re: [Slony1-general] Slony 2.2.6 release plans

2017-08-03 Thread Tignor, Tom
ote: On Mon, 31 Jul 2017, Tignor, Tom wrote: I THINK, and I am not 100% sure of this, but looking at the code it appears to do this is that the failover process will wait for each of the non-failed nodes to receive/confirm the FAILOVER event before finishng the

[Slony1-general] failover failure and mysterious missing paths

2017-06-27 Thread Tignor, Tom
Hello Slony-I community, Hoping someone can advise on a strange and serious problem. We performed a slony service failover yesterday. For the first time ever, our slony service FAILOVER op errored out. We recently expanded our cluster to 7 consumers from a single

[Slony1-general] 2.2.4 -> 2.2.6 upgrade

2017-11-13 Thread Tignor, Tom
Hello Slony-I community, We’re working on a postgres upgrade now and considering a slony 2.2.4 to 2.2.6 upgrade to perform at the same time. I’ve found the Slony-I Upgrade section in the doc. This describes an all-at-once upgrade procedure which is difficult for

[Slony1-general] slony1 drop node failure

2018-02-22 Thread Tignor, Tom
Hello slony1 community, We have a head scratcher here. It appears a DROP NODE command was not fully processed. The command was issued and confirmed on all our nodes at approximately 2018-02-21 19:19:50 UTC. When we went to restore it over two hours later, all

Re: [Slony1-general] slony1 drop node failure

2018-02-22 Thread Tignor, Tom
, 6:06 PM, "Steve Singer" <st...@ssinger.info> wrote: On Thu, 22 Feb 2018, Tignor, Tom wrote: Looks like? http://lists.slony.info/pipermail/slony1-general/2016-September/013331.html I can't remember if that was what prompted http://lists.slony.info/

Re: [Slony1-general] slony1 drop node failure

2018-02-26 Thread Tignor, Tom
( On 2/26/18, 12:01 PM, "Steve Singer" <st...@ssinger.info> wrote: On Mon, 26 Feb 2018, Tignor, Tom wrote: > > Thanks. I see the deletes added for sl_seqlog and sl_log_script. The > constraint violation appearing in the errors was for sl_e

Re: [Slony1-general] slony1 drop node failure

2018-02-26 Thread Tignor, Tom
r.info> wrote: On Thu, 22 Feb 2018, Tignor, Tom wrote: Looks like? http://lists.slony.info/pipermail/slony1-general/2016-September/013331.html I can't remember if that was what prompted http://lists.slony.info/pipermail/slony1-hackers/2016-December/000560.html

Re: [Slony1-general] slony1 drop node failure

2018-03-02 Thread Tignor, Tom
19:19:55 UTC [7420] CONFIG disableNode: no_id=8 Tom( On 2/28/18, 10:28 PM, "Steve Singer" <st...@ssinger.info> wrote: On Mon, 26 Feb 2018, Tignor, Tom wrote: > > In the slony1 log of our primary host (the same one which later s

[Slony1-general] SYNC content question

2018-03-02 Thread Tignor, Tom
Hello slony1 community, We’re trying to devise a means to distinguish SYNCs with actual data changes vs. SYNCs which are simple heartbeats. Is there some convenient way to do this? Have looked at the sl_event and sl_log_* tables but nothing jumps out.

Re: [Slony1-general] slony1 drop node failure

2018-02-26 Thread Tignor, Tom
/18, 11:03 AM, "Steve Singer" <st...@ssinger.info> wrote: On Mon, 26 Feb 2018, Tignor, Tom wrote: You can get it from the github branch (latest commit) at https://github.com/ssinger/slony1-engine/tree/bug375 > > Steve, >

Re: [Slony1-general] enable_version_check

2018-08-31 Thread Tignor, Tom
Hi Andy, As you've noticed, the config option is provided separately to each slon daemon in your service, so a slon before 2.2.7 isn't going to know about the option and so will still run the schema check. We also use a single provider cluster, and in our upgrade process, the

[Slony1-general] replication performance

2019-09-13 Thread Tignor, Tom
Hello Slony community, Our service has a DML-heavy task (thousands of rows moved between tables) occurring at the top of each hour. We are seeing changelog processing virtually stop on one or more subscribers for several minutes right after this occurs. (Log

[Slony1-general] Slony 2.2.6 and Postgres 10

2019-09-12 Thread Tignor, Tom
Hello Slony community, Wanted to check on this slony 2.2.6 message. The doc and the web site indicate that v2.2.6 supports postgres 10. We are now running QA with postgres v10.7. We are seeing this message in our log, though. Everything seems to be working in spite of it. Is

[Slony1-general] Slony-I paths

2019-07-18 Thread Tignor, Tom
Hello Slony-I, Is there any way with the Slony-I service to just make every node accept a single path (IP) for a given node? We have a bear of a time every time we need to rebuild nodes, with some nodes getting path updates and others permanently, stubbornly refusing to get

[Slony1-general] FAILOVER deadlock failure

2020-01-14 Thread Tignor, Tom
Hello Slony-I community, We're running Slony-I 2.2.6 with postgres 10.7 in a relatively simple config: four subscribers replicating two table sets from a single producer. In a recent service failover, the Slony-I FAILOVER op, which is normally quite reliable, failed on a