Re: speed up a logical replica setup

2024-05-23 Thread Euler Taveira
On Thu, May 23, 2024, at 5:54 AM, Amit Kapila wrote:
> On Wed, May 22, 2024 at 8:46 PM Euler Taveira  wrote:
> >
> > Following the same line that simplifies the code, we can: (a) add a loop in
> > check_subscriber() that waits until walreceiver is available on subscriber 
> > or
> > (b) use a timeout. The main advantage of (a) is that the primary slot is 
> > already
> > available but I'm afraid we need a escape mechanism for the loop (timeout?).
> >
> 
> Sorry, it is not clear to me why we need any additional loop in
> check_subscriber(), aren't we speaking about the problem in
> check_publisher() function?

The idea is to use check_subscriber() to check pg_stat_walreceiver. Once this
view returns a row and primary_slot_name is set on standby, the referred
replication slot name should be active on primary. Hence, the query on
check_publisher() make sure that the referred replication slot is in use on
primary. 

> Why in the first place do we need to ensure that primary_slot_name is
> active on the primary? You mentioned something related to WAL
> retention but I don't know how that is related to this tool's
> functionality. If at all, we are bothered about WAL retention on the
> primary that should be the WAL corresponding to consistent_lsn
> computed by setup_publisher() but this check doesn't seem to ensure
> that.

Maybe it is a lot of checks. I'm afraid there isn't a simple way to get and
make sure the replication slot is used by the physical replication. I mean if
there is primary_slot_name = 'foo' on standby, there is no guarantee that the
replication slot 'foo' exists on primary. The idea is to get the exact
replication slot name used by physical replication to drop it. Once I posted a
patch it should be clear. (Another idea is to relax this check and rely only on
primary_slot_name to drop this replication slot on primary. The replication slot
might not exist and it shouldn't return an error in this case.)


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: speed up a logical replica setup

2024-05-22 Thread Euler Taveira
On Wed, May 22, 2024, at 8:19 AM, Amit Kapila wrote:
> >
> > v2-0001: not changed
> >
> 
> Shouldn't we modify it as per the suggestion given in the email [1]? I
> am wondering if we can entirely get rid of checking the primary
> business and simply rely on recovery_timeout and keep checking
> server_is_in_recovery(). If so, we can modify the test to use
> non-default recovery_timeout (say 180s or something similar if we have
> used it at any other place). As an additional check we can ensure that
> constent_lsn is present on standby.

That's exactly what I want to propose as Tomas convinced me offlist that less is
better when we don't have a useful recovery progress reporting mechanism to make
sure it is still working on the recovery and we should wait.

> > v2-0002: not changed
> >
> 
> We have added more tries to see if the primary_slot_name becomes
> active but I think it is still fragile because it is possible on slow
> machines that the required slot didn't become active even after more
> retries. I have raised the same comment previously [2] and asked an
> additional question but didn't get any response.

Following the same line that simplifies the code, we can: (a) add a loop in
check_subscriber() that waits until walreceiver is available on subscriber or
(b) use a timeout. The main advantage of (a) is that the primary slot is already
available but I'm afraid we need a escape mechanism for the loop (timeout?).

I'll summarize all issues as soon as I finish the review of sync slot support. I
think we should avoid new development if we judge that the item can be
documented as a limitation for this version. Nevertheless, I will share patches
so you can give your opinion on whether it is an open item or new development.


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: State of pg_createsubscriber

2024-05-19 Thread Euler Taveira
On Sun, May 19, 2024, at 2:30 PM, Tom Lane wrote:
> I'm fairly disturbed about the readiness of pg_createsubscriber.
> The 040_pg_createsubscriber.pl TAP test is moderately unstable
> in the buildfarm [1], and there are various unaddressed complaints
> at the end of the patch thread (pretty much everything below [2]).
> I guess this is good enough to start beta with, but it's far from
> being good enough to ship, IMO.  If there were active work going
> on to fix these things, I'd feel better, but neither the C code
> nor the test script have been touched since 1 April.

My bad. :( I'll post patches soon to address all of the points.


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: EXPLAN redundant options

2024-05-02 Thread Euler Taveira
On Thu, May 2, 2024, at 10:36 AM, David G. Johnston wrote:
> On Thu, May 2, 2024 at 6:17 AM jian he  wrote:
>> explain (verbose, verbose off, analyze on, analyze off, analyze on)
> 
> I would just update this paragraph to note the last one wins behavior.
> 
> "When the option list is surrounded by parentheses, the options can be 
> written in any order.  However, if options are repeated the last one listed 
> is used."
> 
> I have no desire to introduce breakage here.  The implemented concept is 
> actually quite common.  The inconsistency with COPY seems like a minor point. 
>  It would influence my green field choice but not enough for changing 
> long-standing behavior.
> 

There is no policy saying we cannot introduce incompatibility changes in major
releases. If you check for "conflicting or redundant options" or
"errorConflictingDefElem", you will notice that various SQL commands prevent
you to inform redundant options. IMO avoid redundant options is a good goal
because (i) it keeps the command short and (b) it doesn't require you to check
all options to figure out what's the current option value. If the application
is already avoiding redundant options for other commands, the probability of
allowing it just for EXPLAIN is low.

postgres=# create database foo with owner = bar owner = xpto;
ERROR:  conflicting or redundant options
LINE 1: create database foo with owner = bar owner = xpto;
   ^
postgres=# create user foo with createdb login createdb;
ERROR:  conflicting or redundant options
LINE 1: create user foo with createdb login createdb;
^


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: speed up a logical replica setup

2024-04-29 Thread Euler Taveira
On Mon, Apr 29, 2024, at 6:56 AM, Amit Kapila wrote:
> On Wed, Mar 27, 2024 at 1:47 AM Euler Taveira  wrote:
> >
> > On Tue, Mar 26, 2024, at 4:12 PM, Tomas Vondra wrote:
> >
> > Perhaps I'm missing something, but why is NUM_CONN_ATTEMPTS even needed?
> > Why isn't recovery_timeout enough to decide if wait_for_end_recovery()
> > waited long enough?
> >
> >
> > It was an attempt to decoupled a connection failure (that keeps streaming 
> > the
> > WAL) from recovery timeout. The NUM_CONN_ATTEMPTS guarantees that if the 
> > primary
> > is gone during the standby recovery process, there is a way to bail out.
> >
> 
> I think we don't need to check primary if the WAL corresponding to
> consistent_lsn is already present on the standby. Shouldn't we first
> check that? Once we ensure that the required WAL is copied, just
> checking server_is_in_recovery() should be sufficient. I feel that
> will be a direct way of ensuring what is required rather than
> indirectly verifying the same (by checking pg_stat_wal_receiver) as we
> are doing currently.

How would you check it? WAL file? During recovery, you are not allowed to use
pg_current_wal_lsn.

Tomas suggested to me off-list that we should adopt a simple solution in
wait_for_end_recovery: wait for recovery_timeout without additional checks
(which means remove the pg_stat_wal_receiver logic).  When we have additional
information that we can reliably use in this function, we can add it. Hence, it
is also easy to adjust the PG_TEST_TIMEOUT_DEFAULT to have stable tests.


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: Idea Feedback: psql \h misses -> Offers Links?

2024-04-19 Thread Euler Taveira
On Wed, Apr 17, 2024, at 2:47 PM, Kirk Wolak wrote:
>   I often use the ctrl-click on the link after getting help in psql.  A great 
> feature.
> 
> Challenge, when there is no help, you don't get any link.
> 
>   My thought process is to add a default response that would take them to
> https://www.postgresql.org/search/?u=%2Fdocs%2F16%2F={TOKEN} 
> <https://www.postgresql.org/search/?u=%2Fdocs%2F16%2F=%7BTOKEN%7D>
> 
> *Example:*
> \h current_setting
> No help available for "current_setting".
> Try \h with no arguments to see available help.

That's because current_setting is a function. Help says:

postgres=# \?
.
.
.
Help
  \? [commands]  show help on backslash commands
  \? options show help on psql command-line options
  \? variables   show help on special variables
  \h [NAME]  help on syntax of SQL commands, * for all commands

It is just for SQL commands.

> https://www.postgresql.org/search/?u=%2Fdocs%2F16%2F=current_setting
> 
>   To me, this is a huge step in helping me get to the docs.
> 
> This is Question 1: Do others see the potential value here?

Yes. However, I expect an exact and direct answer. There will be cases that the
first result is not the one you are looking for. (You are expecting the
function or parameter description but other page is on the top because it is
more relevant.) The referred URL does not point you to the direct link.
Instead, you have to click again to be able to check the content.

> Question 2: What if we allowed the users to set some extra link Templates 
> using \pset??
> 
> \pset help_assist_link_1 =  https://www.google.com/search?q={token} 
> <https://www.google.com/search?q=%7Btoken%7D>'
> \pset help_assist_link_2 = 
> 'https://wiki.postgresql.org/index.php?title=Special%3ASearch={token}=Go
>  
> <https://wiki.postgresql.org/index.php?title=Special%3ASearch=%7Btoken%7D=Go>'

That's a different idea. Are you proposing to provide URLs if this psql
variable is set and it doesn't find an entry (say \h foo)? I'm not sure if it
is a good idea to allow third-party URLs (even if it is configurable).

IMO we should expand \h to list documentation references for functions and GUCs
using SGML files. We already did it for SQL commands. Another broader idea is
to build an inverted index similar to what Index [1] provides. The main problem
with this approach is to create a dependency between documentation build and
psql. Maybe there is a reasonable way to obtain the links for each term.


[1] https://www.postgresql.org/docs/current/bookindex.html


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: [HACKERS] make async slave to wait for lsn to be replayed

2024-03-29 Thread Euler Taveira
On Fri, Mar 29, 2024, at 9:44 AM, Alexander Korotkov wrote:
> This generally makes sense, but I'm not sure about this.  The
> milliseconds timeout was used initially but received critics in [1].

Alexander, I see why you changed the patch.

Peter suggested to use an interval but you proposed another data type:
float. The advantage of the interval data type is that you don't need to
carefully think about the unit, however, if you use the integer data
type you have to propose one. (If that's the case, milliseconds is a
good granularity for this feature.) I don't have a strong preference
between integer and interval data types but I don't like the float for
this case. The 2 main reasons are (a) that we treat time units (hours,
minutes, seconds, ...) as integers so it seems natural for a human being
to use a unit time as integer and (b) depending on the number of digits
after the decimal separator you still don't have an integer in the
internal unit, hence, you have to round it to integer.

We already have functions that use integer (such as pg_terminate_backend)
and interval (such as pg_sleep_for) and if i searched correctly it will
be the first timeout argument as float.


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: [HACKERS] make async slave to wait for lsn to be replayed

2024-03-28 Thread Euler Taveira
On Thu, Mar 28, 2024, at 9:39 AM, Alexander Korotkov wrote:
> Fixed along with other issues spotted by Alexander Lakhin.

[I didn't read the whole thread. I'm sorry if I missed something ...]

You renamed the function in a previous version but let me suggest another one:
pg_wal_replay_wait. It uses the same pattern as the other recovery control
functions [1]. I think "for" doesn't add much for the function name and "lsn" is
used in functions that return an LSN (that's not the case here).

postgres=# \df pg_wal_replay*
List of functions
-[ RECORD 1 ]---+-
Schema  | pg_catalog
Name| pg_wal_replay_pause
Result data type| void
Argument data types | 
Type| func
-[ RECORD 2 ]---+-
Schema  | pg_catalog
Name| pg_wal_replay_resume
Result data type| void
Argument data types | 
Type| func

Regarding the arguments, I think the timeout should be bigint. There is at least
another function that implements a timeout that uses bigint. 

postgres=# \df pg_terminate_backend
List of functions
-[ RECORD 1 ]---+--
Schema  | pg_catalog
Name| pg_terminate_backend
Result data type| boolean
Argument data types | pid integer, timeout bigint DEFAULT 0
Type| func

I also suggests that the timeout unit should be milliseconds, hence, using
bigint is perfectly fine for the timeout argument.

+   
+Throws an ERROR if the target lsn was not replayed
+on standby within given timeout.  Parameter 
timeout
+is the time in seconds to wait for the 
target_lsn
+replay. When timeout value equals to zero no
+timeout is applied.
+   


[1] 
https://www.postgresql.org/docs/current/functions-admin.html#FUNCTIONS-RECOVERY-CONTROL


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: Fix some resources leaks (src/bin/pg_basebackup/pg_createsubscriber.c)

2024-03-27 Thread Euler Taveira
On Wed, Mar 27, 2024, at 8:50 PM, Ranier Vilela wrote:
> Coverity has some reports in the new code
> pg_createsubscriber.c
> I think that Coverity is right.

It depends on your "right" definition. If your program execution is ephemeral
and the leak is just before exiting, do you think it's worth it?

> 1.
> CID 1542682: (#1 of 1): Resource leak (RESOURCE_LEAK)
> leaked_storage: Variable buf going out of scope leaks the storage it points 
> to.

It will exit in the next instruction.

> 2.
> CID 1542704: (#1 of 1): Resource leak (RESOURCE_LEAK)
> leaked_storage: Variable conn going out of scope leaks the storage it points 
> to.

The connect_database function whose exit_on_error is false is used in 2 
routines:

* cleanup_objects_atexit: that's about to exit;
* drop_primary_replication_slot: that will execute a few routines before 
exiting.

> 3.
> CID 1542691: (#1 of 1): Resource leak (RESOURCE_LEAK)
> leaked_storage: Variable str going out of scope leaks the storage it points 
> to.

It will exit in the next instruction.

Having said that, applying this patch is just a matter of style.


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: speed up a logical replica setup

2024-03-26 Thread Euler Taveira
On Tue, Mar 26, 2024, at 4:12 PM, Tomas Vondra wrote:
> Perhaps I'm missing something, but why is NUM_CONN_ATTEMPTS even needed?
> Why isn't recovery_timeout enough to decide if wait_for_end_recovery()
> waited long enough?

It was an attempt to decoupled a connection failure (that keeps streaming the
WAL) from recovery timeout. The NUM_CONN_ATTEMPTS guarantees that if the primary
is gone during the standby recovery process, there is a way to bail out. The
recovery-timeout is 0 (infinite) by default so you have an infinite wait without
this check. The idea behind this implementation is to avoid exiting in this
critical code path. If it times out here you might have to rebuild the standby
and start again. Amit suggested [1] that we use a value as recovery-timeout but
how high is a good value? I've already saw some long recovery process using
pglogical equivalent that timeout out after hundreds of minutes. Maybe I'm too
worried about a small percentage of cases and we should use 1h as default, for
example. It would reduce the complexity since the recovery process lacks some
progress indicators (LSN is not sufficient in this case and there isn't a
function to provide the current state -- stop applying WAL, reach target, new
timeline, etc).

If we remove the pg_stat_wal_receiver check, we should avoid infinite recovery
by default otherwise we will have some reports saying the tool is hanging when
in reality the primary has gone and WAL should be streamed.

> IMHO the test should simply pass PG_TEST_DEFAULT_TIMEOUT when calling
> pg_createsubscriber, and that should do the trick.

That's a good idea. Tests are not exercising the recovery-timeout option.

> Increasing PG_TEST_DEFAULT_TIMEOUT is what buildfarm animals doing
> things like ubsan/valgrind already use to deal with exactly this kind of
> timeout problem.
> 
> Or is there a deeper problem with deciding if the system is in recovery?

As I said with some recovery progress indicators it would be easier to make some
decisions like wait a few seconds because the WAL has already been applied and
it is creating a new timeline. The recovery timeout decision is a shot in the
dark because we might be aborting pg_createsubscriber when the target server is
about to set RECOVERY_STATE_DONE.


[1] 
https://www.postgresql.org/message-id/CAA4eK1JRgnhv_ySzuFjN7UaX9qxz5Hqcwew7Fk%3D%2BAbJbu0Kd9w%40mail.gmail.com


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: speed up a logical replica setup

2024-03-25 Thread Euler Taveira
On Mon, Mar 25, 2024, at 11:33 PM, Amit Kapila wrote:
> On Mon, Mar 25, 2024 at 5:25 PM Peter Eisentraut  wrote:
> >
> > I have committed your version v33.  I did another pass over the
> > identifier and literal quoting.  I added quoting for replication slot
> > names, for example, even though they can only contain a restricted set
> > of characters, but it felt better to be defensive there.
> >
> > I'm happy to entertain follow-up patches on some of the details like
> > option naming that were still being discussed.  I just wanted to get the
> > main functionality in in good time.  We can fine-tune the rest over the
> > next few weeks.
> >
> 
> I was looking at prior discussions on this topic to see if there are
> any other open design points apart from this and noticed that the
> points raised/discussed in the email [1] are also not addressed. IIRC,
> the key point we discussed was that after promotion, the existing
> replication objects should be removed (either optionally or always),
> otherwise, it can lead to a new subscriber not being able to restart
> or getting some unwarranted data.

See setup_subscriber.

/*
 * Since the publication was created before the consistent LSN, it is
 * available on the subscriber when the physical replica is promoted.
 * Remove publications from the subscriber because it has no use.
 */
drop_publication(conn, [i]);


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: speed up a logical replica setup

2024-03-25 Thread Euler Taveira
 started. So there is a possibility
> that the checking is run before walsender is launched.
> 
> One possible approach is to wait until the replication starts. Alternative 
> one is
> to ease the condition.

That's my suggestion too. I reused NUM_CONN_ATTEMPTS (that was renamed to
NUM_ATTEMPTS in the first patch). See second patch.


--
Euler Taveira
EDB   https://www.enterprisedb.com/
From b2cfa13627bd844b28fe050ffb35e8b9696fdf2e Mon Sep 17 00:00:00 2001
From: Euler Taveira 
Date: Mon, 25 Mar 2024 22:01:52 -0300
Subject: [PATCH v1 1/2] Improve the code that checks if the recovery is
 finishing

The recovery process has a window between the walreceiver shutdown and
the pg_is_in_recovery function returns false. It means that the
pg_stat_wal_receiver checks can cause the server to finish the recovery
(even if it already reaches the recovery target). Since it checks the
pg_stat_wal_receiver to verify the primary is available, if it does not
return a row, PQping the primary server. If it is up and running, it
can indicate that the target server is finishing the recovery process,
hence, we shouldn't count it as an attempt. It avoids premature failures
on slow hosts.

While on it, increase the number of attempts (10 to 60). The wait time is
the same pg_promote function uses by default.
---
 src/bin/pg_basebackup/pg_createsubscriber.c | 30 +
 1 file changed, 19 insertions(+), 11 deletions(-)

diff --git a/src/bin/pg_basebackup/pg_createsubscriber.c b/src/bin/pg_basebackup/pg_createsubscriber.c
index b8f8269340..cca93d8c25 100644
--- a/src/bin/pg_basebackup/pg_createsubscriber.c
+++ b/src/bin/pg_basebackup/pg_createsubscriber.c
@@ -30,6 +30,8 @@
 
 #define	DEFAULT_SUB_PORT	"50432"
 
+#define NUM_ATTEMPTS		60
+
 /* Command-line options */
 struct CreateSubscriberOptions
 {
@@ -93,7 +95,7 @@ static void pg_ctl_status(const char *pg_ctl_cmd, int rc);
 static void start_standby_server(const struct CreateSubscriberOptions *opt,
  bool restricted_access);
 static void stop_standby_server(const char *datadir);
-static void wait_for_end_recovery(const char *conninfo,
+static void wait_for_end_recovery(const struct LogicalRepInfo *dbinfo,
   const struct CreateSubscriberOptions *opt);
 static void create_publication(PGconn *conn, struct LogicalRepInfo *dbinfo);
 static void drop_publication(PGconn *conn, struct LogicalRepInfo *dbinfo);
@@ -1354,18 +1356,16 @@ stop_standby_server(const char *datadir)
  * the recovery process. By default, it waits forever.
  */
 static void
-wait_for_end_recovery(const char *conninfo, const struct CreateSubscriberOptions *opt)
+wait_for_end_recovery(const struct LogicalRepInfo *dbinfo, const struct CreateSubscriberOptions *opt)
 {
 	PGconn	   *conn;
 	int			status = POSTMASTER_STILL_STARTING;
 	int			timer = 0;
 	int			count = 0;		/* number of consecutive connection attempts */
 
-#define NUM_CONN_ATTEMPTS	10
-
 	pg_log_info("waiting for the target server to reach the consistent state");
 
-	conn = connect_database(conninfo, true);
+	conn = connect_database(dbinfo->subconninfo, true);
 
 	for (;;)
 	{
@@ -1384,16 +1384,24 @@ wait_for_end_recovery(const char *conninfo, const struct CreateSubscriberOptions
 		}
 
 		/*
-		 * If it is still in recovery, make sure the target server is
-		 * connected to the primary so it can receive the required WAL to
-		 * finish the recovery process. If it is disconnected try
-		 * NUM_CONN_ATTEMPTS in a row and bail out if not succeed.
+		 * If it is still in recovery, make sure the target server is connected
+		 * to the primary so it can receive the required WAL to finish the
+		 * recovery process. If the walreceiver process is not running it
+		 * should indicate that (i) the recovery is almost finished or (ii) the
+		 * primary is not running or is not accpeting connections. It should
+		 * count as attempts iif (ii) is true. In this case, try NUM_ATTEMPTS
+		 * in a row and bail out if not succeed.
 		 */
 		res = PQexec(conn,
 	 "SELECT 1 FROM pg_catalog.pg_stat_wal_receiver");
 		if (PQntuples(res) == 0)
 		{
-			if (++count > NUM_CONN_ATTEMPTS)
+			if (PQping(dbinfo->pubconninfo) != PQPING_OK)
+count++;
+			else
+count = 0;		/* reset counter if it connects again */
+
+			if (count > NUM_ATTEMPTS)
 			{
 stop_standby_server(subscriber_dir);
 pg_log_error("standby server disconnected from the primary");
@@ -2113,7 +2121,7 @@ main(int argc, char **argv)
 	start_standby_server(, true);
 
 	/* Waiting the subscriber to be promoted */
-	wait_for_end_recovery(dbinfo[0].subconninfo, );
+	wait_for_end_recovery([0], );
 
 	/*
 	 * Create the subscription for each database on subscriber. It does not
-- 
2.30.2

From 872510ed67b91cb7482dea3bce831549d3519e80 Mon Sep 17 00:00:00 2001
From: Euler Taveira 
Date: Mon, 25 Mar 2024 23:25:30 -0300
Subject: [PATCH v1 2/2] Improve the code that checks if the primary slot is
 available

The target ser

Re: speed up a logical replica setup

2024-03-25 Thread Euler Taveira
On Mon, Mar 25, 2024, at 8:55 AM, Peter Eisentraut wrote:
> On 22.03.24 04:31, Euler Taveira wrote:
> > On Thu, Mar 21, 2024, at 6:49 AM, Shlok Kyal wrote:
> >> There is a compilation error while building postgres with the patch
> >> due to a recent commit. I have attached a top-up patch v32-0003 to
> >> resolve this compilation error.
> >> I have not updated the version of the patch as I have not made any
> >> change in v32-0001 and v32-0002 patch.
> > 
> > I'm attaching a new version (v33) to incorporate this fix (v32-0003) 
> > into the
> > main patch (v32-0001). This version also includes 2 new tests:
> > 
> > - refuse to run if the standby server is running
> > - refuse to run if the standby was promoted e.g. it is not in recovery
> > 
> > The first one exercises a recent change (standby should be stopped) and the
> > second one covers an important requirement.
> 
> I have committed your version v33.  I did another pass over the 
> identifier and literal quoting.  I added quoting for replication slot 
> names, for example, even though they can only contain a restricted set 
> of characters, but it felt better to be defensive there.
Thanks.

> I'm happy to entertain follow-up patches on some of the details like 
> option naming that were still being discussed.  I just wanted to get the 
> main functionality in in good time.  We can fine-tune the rest over the 
> next few weeks.
Agree. Let's continue the discussion about the details.

> > Based on the discussion [1] about the check functions, Vignesh suggested 
> > that it
> > should check both server before exiting. v33-0003 implements it. I don't 
> > have a
> > strong preference; feel free to apply it.
> 
> I haven't done anything about this.
... including this one.


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: speed up a logical replica setup

2024-03-21 Thread Euler Taveira
On Thu, Mar 21, 2024, at 6:49 AM, Shlok Kyal wrote:
> There is a compilation error while building postgres with the patch
> due to a recent commit. I have attached a top-up patch v32-0003 to
> resolve this compilation error.
> I have not updated the version of the patch as I have not made any
> change in v32-0001 and v32-0002 patch.

I'm attaching a new version (v33) to incorporate this fix (v32-0003) into the
main patch (v32-0001). This version also includes 2 new tests:

- refuse to run if the standby server is running
- refuse to run if the standby was promoted e.g. it is not in recovery

The first one exercises a recent change (standby should be stopped) and the
second one covers an important requirement.

Based on the discussion [1] about the check functions, Vignesh suggested that it
should check both server before exiting. v33-0003 implements it. I don't have a
strong preference; feel free to apply it.


[1] 
https://www.postgresql.org/message-id/CALDaNm1Dg5tDRmaabk%2BZND4WF17NrNq52WZxCE%2B90-PGz5trQQ%40mail.gmail.com


--
Euler Taveira
EDB   https://www.enterprisedb.com/


v33-0001-pg_createsubscriber-creates-a-new-logical-replic.patch.gz
Description: application/gzip


v33-0002-Remove-How-it-Works-section.patch.gz
Description: application/gzip


v33-0003-Check-both-servers-before-exiting.patch.gz
Description: application/gzip


Re: speed up a logical replica setup

2024-03-21 Thread Euler Taveira
On Thu, Mar 21, 2024, at 10:33 AM, vignesh C wrote:
> If we commit this we might not be able to change the way the option
> behaves once the customers starts using it. How about removing these
> options in the first version and adding it in the next version after
> more discussion.

We don't need to redesign this one if we want to add a format string in a next
version. A long time ago, pg_dump started to accept pattern for tables without
breaking or deprecating the -t option. If you have 100 databases and you don't
want to specify the options or use a script to generate it for you, you also
have the option to let pg_createsubscriber generate the object names for you.
Per my experience, it will be a rare case.

> Currently dry-run will do the check and might fail on identifying a
> few failures like after checking subscriber configurations. Then the
> user will have to correct the configuration and re-run then fix the
> next set of failures. Whereas the suggest-config will display all the
> optimal configuration for both the primary and the standby in a single
> shot. This is not a must in the first version, it can be done as a
> subsequent enhancement.

Do you meant publisher, right? Per order, check_subscriber is done before
check_publisher and it checks all settings on the subscriber before exiting. In
v30, I changed the way it provides the required settings. In a previous version,
it fails when it found a wrong setting; the current version, check all settings
from that server before providing a suitable error.

pg_createsubscriber: checking settings on publisher
pg_createsubscriber: primary has replication slot "physical_slot"
pg_createsubscriber: error: publisher requires wal_level >= logical
pg_createsubscriber: error: publisher requires 2 replication slots, but only 0 
remain
pg_createsubscriber: hint: Consider increasing max_replication_slots to at 
least 3.
pg_createsubscriber: error: publisher requires 2 wal sender processes, but only 
0 remain
pg_createsubscriber: hint: Consider increasing max_wal_senders to at least 3.

If you have such an error, you will fix them all and rerun using dry run mode
again to verify everything is ok. I don't have a strong preference about it. It
can be changed easily (unifying the check functions or providing a return for
each of the check functions).


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: speed up a logical replica setup

2024-03-20 Thread Euler Taveira
On Wed, Mar 20, 2024, at 7:16 AM, Shubham Khanna wrote:
> On Tue, Mar 19, 2024 at 8:54 PM vignesh C  wrote:
> >
> > If you are not planning to have the checks for name length, this could
> > alternatively be fixed by including database id also while querying
> > pg_subscription like below in set_replication_progress function:
> > appendPQExpBuffer(str,
> > "SELECT oid FROM pg_catalog.pg_subscription \n"
> > "WHERE subname = '%s' AND subdbid = (SELECT oid FROM
> > pg_catalog.pg_database WHERE datname = '%s')",
> > dbinfo->subname,
> > dbinfo->dbname);
> 
> The attached patch has the changes to handle the same.

I included a different query that does the same. See v32.


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: speed up a logical replica setup

2024-03-20 Thread Euler Taveira
On Tue, Mar 19, 2024, at 8:57 AM, Peter Eisentraut wrote:
> On 19.03.24 12:26, Shlok Kyal wrote:
> > 
> > I have added a top-up patch v30-0003. The issue in [1] still exists in
> > the v30 patch. I was not able to come up with an approach to handle it
> > in the code, so I have added it to the documentation in the warning
> > section. Thoughts?
> 
> Seems acceptable to me.  pg_createsubscriber will probably always have 
> some restrictions and unsupported edge cases like that.  We can't 
> support everything, so documenting is ok.

Shlok, I'm not sure we should add a sentence about a pilot error. I added a
comment in check_subscriber that describes this situation. I think the comment
is sufficient to understand the limitation and, if it is possible in the future,
a check might be added for it. I didn't include v31-0004.


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: speed up a logical replica setup

2024-03-20 Thread Euler Taveira
ified. This follows the rule for CREATE SUBSCRIPTION.

Agreed.

> 31.
> 
> ```
> /* Create replication slot on publisher */
> if (lsn)
> pg_free(lsn);
> ```
> 
> I think allocating/freeing memory is not so efficient.
> Can we add a flag to create_logical_replication_slot() for controlling the
> returning value (NULL or duplicated string)? We can use the condition (i == 
> num_dbs-1)
> as flag.

It is not. This code path is not critical. You are suggesting to add
complexity here. Efficiency is a good goal but in this case it only adds
complexity with small return.

> 32.
> 
> ```
> /*
> * Close the connection. If exit_on_error is true, it has an undesired
> * condition and it should exit immediately.
> */
> static void
> disconnect_database(PGconn *conn, bool exit_on_error)
> ```
> 
> In case of disconnect_database(), the second argument should have different 
> name.
> If it is true, the process exits unconditionally.
> Also, comments atop the function must be fixed.

I choose a short name. The comment seems ok to me.

> 
> 33.
> 
> ```
> wal_level = strdup(PQgetvalue(res, 0, 0));
> ```
> 
> pg_strdup should be used here.

Fixed.

> 34.
> 
> ```
> {"config-file", required_argument, NULL, 1},
> {"publication", required_argument, NULL, 2},
> {"replication-slot", required_argument, NULL, 3},
> {"subscription", required_argument, NULL, 4},
> ```
> 
> The ordering looks strange for me. According to pg_upgarade and pg_basebackup,
> options which do not have short notation are listed behind.

Fixed.

> 35.
> 
> ```
> opt.sub_port = palloc(16);
> ```
> 
> Per other lines, pg_alloc() should be used.

I think you meant pg_malloc. Fixed.

> 36.
> 
> ```
> pg_free(opt.sub_port);
> ```
> 
> You said that the leak won't be concerned here. If so, why only 'p' has 
> pg_free()?

Fixed.

> 37.
> 
> ```
> /* Register a function to clean up objects in case of failure */
> atexit(cleanup_objects_atexit);
> ```
> 
> Sorry if we have already discussed. I think the registration can be moved just
> before the boot of the standby. Before that, the callback will be no-op.

The main reason is to catch future cases added *before* the point you
want to move this call that requires a cleanup. As you said it is a
no-op. My preference for atexit() calls is to add it as earlier as
possible to avoid leaving cases that it should trigger.

> 38.
> 
> ```
> /* Subscriber PID file */
> snprintf(pidfile, MAXPGPATH, "%s/postmaster.pid", subscriber_dir);
> 
> /*
> * If the standby server is running, stop it. Some parameters (that can
> * only be set at server start) are informed by command-line options.
> */
> if (stat(pidfile, ) == 0)
> ```
> 
> Hmm. pidfile is used only here, but it is declared in main(). Can it be
> separated into another funtion like is_standby_started()?

It is so small that I didn't bother adding a new function for it.

> 39.
> 
> Or, we may able to introcue "restart_standby_if_needed" or something.
> 
> 40.
> 
> ```
> * XXX this code was extracted from BootStrapXLOG().
> ```
> 
> So, can we extract the common part to somewhere? Since system identifier is 
> related
> with the controldata file, I think it can be located in controldata_util.c.

I added this comment here as a reference from where I extracted the
code. The referred function is from backend. Feel free to propose a
separate patch for it.

> 41.
> 
> You said like below in [1], but I could not find the related fix. Can you 
> clarify?
> 
> > That's a good point. We should state in the documentation that GUCs 
> > specified in
> > the command-line options are ignored during the execution.

I added a sentence for it. See "How It Works".


[1] 
https://www.postgresql.org/docs/current/logical-replication-architecture.html#LOGICAL-REPLICATION-SNAPSHOT


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: speed up a logical replica setup

2024-03-20 Thread Euler Taveira
On Mon, Mar 18, 2024, at 10:52 AM, Peter Eisentraut wrote:
> On 16.03.24 16:42, Euler Taveira wrote:
> > I'm attaching a new version (v30) that adds:
> 
> I have some review comments and attached a patch with some smaller 
> fixups (mainly message wording and avoid fixed-size string buffers).

Thanks for your review. I'm attaching a new version (v32) that includes your
fixups, merges the v30-0002 into the main patch [1], addresses Hayato's 
review[2],
your reviews [3][4], and fixes the query for set_replication_progress() [5].

> * doc/src/sgml/ref/pg_createsubscriber.sgml
> 
> I would remove the "How It Works" section.  This is not relevant to
> users, and it is very detailed and will require updating whenever the
> implementation changes.  It could be a source code comment instead.

It uses the same structure as pg_rewind that also describes how it works
internally. I included a separate patch that completely removes the section.

> * src/bin/pg_basebackup/pg_createsubscriber.c
> 
> I think the connection string handling is not robust against funny
> characters, like spaces, in database names etc.

get_base_conninfo() uses PQconninfoParse to parse the connection string. I
expect PQconnectdb to provide a suitable error message in this case. Even if it
builds keywords and values arrays, it is also susceptible to the same issue, no?

> Most SQL commands need to be amended for proper identifier or string
> literal quoting and/or escaping.

I completely forgot about this detail when I added the new options in v30. It is
fixed now. I also changed the tests to exercise it.

> In check_subscriber(): All these permissions checks seem problematic
> to me.  We shouldn't reimplement our own copy of the server's
> permission checks.  The server can check the permissions.  And if the
> permission checking in the server ever changes, then we have
> inconsistencies to take care of.  Also, the error messages "permission
> denied" are inappropriate, because we are not doing the actual thing.
> Maybe we want to do a dry-run for the benefit of the user, but then we
> should do the actual thing, like try to create a replication slot, or
> whatever.  But I would rather just remove all this, it seems too
> problematic.

The main goal of the check_* functions are to minimize error during execution.
I removed the permission checks. The GUC checks were kept.

> In main(): The first check if the standby is running is problematic.
> I think it would be better to require that the standby is initially
> shut down.  Consider, the standby might be running under systemd.
> This tool will try to stop it, systemd will try to restart it.  Let's
> avoid these kinds of battles.  It's also safer if we don't try to
> touch running servers.

That's a good point. I hadn't found an excuse to simplify this but you provided
one. :) There was a worry about ignoring some command-line options that changes
GUCs if the server was started. There was also an ugly case for dry run mode
that has to start the server (if it was running) at the end. Both cases are no
longer issues. The current code provides a suitable error if the target server
is running.

> The -p option (--subscriber-port) doesn't seem to do anything.  In my
> testing, it always uses the compiled-in default port.

It works for me. See this snippet from the regression tests. The port (50945) is
used by pg_ctl.

# Running: pg_createsubscriber --verbose --verbose --pgdata 
/c/pg_createsubscriber/src/bin/pg_basebackup/tmp_check/t_040_pg_createsubscriber_node_s_data/pgdata
 --publisher-server port=50943 host=/tmp/qpngb0bPKo dbname='pg1' 
--socket-directory /tmp/qpngb0bPKo --subscriber-port 50945 --database pg1 
--database pg2
pg_createsubscriber: validating connection string on publisher
.
.
pg_createsubscriber: pg_ctl command is: 
"/c/pg_createsubscriber/tmp_install/c/pg_createsubscriber/bin/pg_ctl" start -D 
"/c/pg_createsubscriber/src/bin/pg_basebackup/tmp_check/t_040_pg_createsubscriber_node_s_data/pgdata"
 -s -o "-p 50945" -o "-c listen_addresses='' -c unix_socket_permissions=0700 -c 
unix_socket_directories='/tmp/qpngb0bPKo'"
2024-03-20 18:15:24.517 -03 [105195] LOG:  starting PostgreSQL 17devel on 
x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
2024-03-20 18:15:24.517 -03 [105195] LOG:  listening on Unix socket 
"/tmp/qpngb0bPKo/.s.PGSQL.50945"

> Printing all the server log lines to the terminal doesn't seem very
> user-friendly.  Not sure what to do about that, short of keeping a 
> pg_upgrade-style directory of log files.  But it's ugly.

I removed the previous implementation that creates a new directory and stores
the log file there. I don't like the pg_upgrade-style directory because (a) it
stores part of the server log files in another place and (b) it is another
direct

Re: speed up a logical replica setup

2024-03-16 Thread Euler Taveira
On Sat, Mar 16, 2024, at 10:31 AM, vignesh C wrote:
> I was able to reproduce this random failure and found the following reason:
> The Minimum recovery ending location 0/500 was more than the
> recovery_target_lsn specified is "0/4001198". In few random cases the
> standby applies a few more WAL records after the replication slot is
> created; this leads to minimum recovery ending location being greater
> than the recovery_target_lsn because of which the server will fail
> with:
> FATAL:  requested recovery stop point is before consistent recovery point

Thanks for checking. I proposed an alternative patch for it [1]. Can you check
it?

[1] 
https://www.postgresql.org/message-id/34637e7f-0330-420d-8f45-1d022962d2fe%40app.fastmail.com


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: speed up a logical replica setup

2024-03-16 Thread Euler Taveira
On Fri, Mar 15, 2024, at 3:34 AM, Amit Kapila wrote:
> Did you consider adding options for publication/subscription/slot
> names as mentioned in my previous email? As discussed in a few emails
> above, it would be quite confusing for users to identify the logical
> replication objects once the standby is converted to subscriber.

Yes. I was wondering to implement after v1 is pushed. I started to write a code
for it but I wasn't sure about the UI. The best approach I came up with was
multiple options in the same order. (I don't provide short options to avoid
possible portability issues with the order.) It means if I have 3 databases and
the following command-line:

pg_createsubscriber ... --database pg1 --database pg2 --database3 --publication
pubx --publication puby --publication pubz

pubx, puby and pubz are created in the database pg1, pg2, and pg3 respectively.

> It seems we care only for publications created on the primary. Isn't
> it possible that some of the publications have been replicated to
> standby by that time, for example, in case failure happens after
> creating a few publications? IIUC, we don't care for standby cleanup
> after failure because it can't be used for streaming replication
> anymore. So, the only choice the user has is to recreate the standby
> and start the pg_createsubscriber again. This sounds questionable to
> me as to whether users would like this behavior. Does anyone else have
> an opinion on this point?

If it happens after creating a publication and before promotion, the cleanup
routine will drop the publications on primary and it will eventually be applied
to the standby via replication later.

> I see the below note in the patch:
> +If pg_createsubscriber fails while processing,
> +then the data directory is likely not in a state that can be recovered. 
> It
> +is true if the target server was promoted. In such a case, creating a new
> +standby server is recommended.
> 
> By reading this it is not completely clear whether the standby is not
> recoverable in case of any error or only an error after the target
> server is promoted. If others agree with this behavior then we should
> write the detailed reason for this somewhere in the comments as well
> unless it is already explained.

I rewrote the sentence to make it clear that only if the server is promoted,
the target server will be in a state that cannot be reused. It provides a
message saying it too.

pg_createsubscriber: target server reached the consistent state
pg_createsubscriber: hint: If pg_createsubscriber fails after this point, you
must recreate the physical replica before continuing.

I'm attaching a new version (v30) that adds:

* 3 new options (--publication, --subscription, --replication-slot) to assign
  names to the objects. The --database option used to ignore duplicate names,
  however, since these new options rely on the number of database options to
  match the number of object name options, it is forbidden from now on. The
  duplication is also forbidden for the object names to avoid errors earlier.
* rewrite the paragraph related to unusuable target server after
  pg_createsubscriber fails.
* Vignesh reported an issue [1] related to reaching the recovery stop point
  before the consistent state is reached. I proposed a simple patch that fixes
  the issue.

[1] 
https://www.postgresql.org/message-id/CALDaNm3VMOi0GugGvhk3motghaFRKSWMCSE2t3YX1e%2BMttToxg%40mail.gmail.com


--
Euler Taveira
EDB   https://www.enterprisedb.com/


v30-0001-pg_createsubscriber-creates-a-new-logical-replic.patch.gz
Description: application/gzip


v30-0002-Stop-the-target-server-earlier.patch.gz
Description: application/gzip


Re: Add publisher and subscriber to glossary documentation.

2024-03-14 Thread Euler Taveira
On Fri, Mar 15, 2024, at 1:14 AM, Amit Kapila wrote:
> I think node should mean instance for both physical and logical
> replication, otherwise, it would be confusing. We need both the usages
> as a particular publication/subscription is defined at the database
> level but the server on which we define those is referred to as a
> node/instance.

If you are creating a subscription that connects to the same instance
(replication between 2 databases in the same cluster), your definition is not
correct and Alvaro's definition is accurate. The node definition is closely
linked to the connection string. While the physical replication does not
specify a database (meaning "any database" referring to an instance), the
logical replication requires a database.


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: speed up a logical replica setup

2024-03-14 Thread Euler Taveira
On Wed, Mar 13, 2024, at 10:09 AM, Shlok Kyal wrote:
> Added a top-up patch v28-0005 to fix this issue.
> I am not changing the version as v28-0001 to v28-0004 is the same as above.

Thanks for your review!

I'm posting a new patch (v29) that merges the previous patches (v28-0002 and
v28-0003). I applied the fix provided by Hayato [1]. It was an oversight during
a rebase. I also included the patch proposed by Shlok [2] that stops the target
server on error if it is running.

Tomas suggested in [3] that maybe the PID should be replaced with something
else that has more entropy. Instead of PID, it uses a random number for
replication slot and subscription. There is also a concern about converting
multiple standbys that will have the same publication name. It added the same
random number to the publication name so it doesn't fail because the
publication already exists. Documentation was changed based on Tomas feedback.

The user name was always included in the subscriber connection string. Let's
have the libpq to choose it. While on it, a new routine (get_sub_conninfo)
contains the code to build the subscriber connection string.

As I said in [4], there wasn't a way to inform a different configuration file.
If your cluster has a postgresql.conf outside PGDATA, when pg_createsubscriber
starts the server it will fail. The new --config-file option let you inform the
postgresql.conf location and the server is started just fine.

I also did some changes in the start_standby_server routine. I replaced the
strcat and snprintf with appendPQExpBuffer that has been used to store the
pg_ctl command.


[1] 
https://www.postgresql.org/message-id/TYCPR01MB12077FD21BB186C5A685C0BF3F52A2%40TYCPR01MB12077.jpnprd01.prod.outlook.com
[2] 
https://www.postgresql.org/message-id/CANhcyEW6-dH28gLbFc5XpDTJ6JPizU%2Bt5g-aKUWJBf5W_Zriqw%40mail.gmail.com
[3] 
https://www.postgresql.org/message-id/6423dfeb-a729-45d3-b71e-7bf1b3adb0c9%40enterprisedb.com
[4] 
https://www.postgresql.org/message-id/d898faad-f6d7-4b0d-b816-b9dcdf490685%40app.fastmail.com


--
Euler Taveira
EDB   https://www.enterprisedb.com/


v29-0001-pg_createsubscriber-creates-a-new-logical-replic.patch.gz
Description: application/gzip


Re: speed up a logical replica setup

2024-03-13 Thread Euler Taveira
On Fri, Mar 8, 2024, at 6:44 AM, Hayato Kuroda (Fujitsu) wrote:
> Hmm, right. I considered below improvements. Tomas and Euler, how do you 
> think?

I'm posting a new patchset v28.

I changed the way that the check function works. From the usability
perspective, it is better to test all conditions and reports all errors (if
any) at once. It avoids multiple executions in dry run mode just to figure out
all of the issues in the initial phase. I also included tests for it using
Shlok's idea [1] although I didn't use v27-0004.

Shlok [1] reported that it was failing on Windows since the socket-directory
option was added. I added a fix for it.

Tomas pointed out the documentation [2] does not provide a good high level
explanation about pg_createsubscriber. I expanded the Description section and
moved the prerequisites to Nodes section. The prerequisites were grouped into
target and source conditions on their own paragraph instead of using a list. It
seems more in line with the style of some applications.

As I said in a previous email [3], I removed the retain option.


[1] 
https://www.postgresql.org/message-id/canhcyeu4q3dwh9ax9bpojcm4ebbhyfenegoaz8xfgyjmcpz...@mail.gmail.com
[2] 
https://www.postgresql.org/message-id/6423dfeb-a729-45d3-b71e-7bf1b3adb0c9%40enterprisedb.com
[3] 
https://www.postgresql.org/message-id/e390e35e-508e-4eb8-92e4-e6b066407a41%40app.fastmail.com


--
Euler Taveira
EDB   https://www.enterprisedb.com/


v28-0001-pg_createsubscriber-creates-a-new-logical-replic.patch.gz
Description: application/gzip


v28-0002-Use-last-replication-slot-position-as-replicatio.patch.gz
Description: application/gzip


v28-0003-port-replace-int-with-string.patch.gz
Description: application/gzip


Re: Identify transactions causing highest wal generation

2024-03-08 Thread Euler Taveira
On Fri, Mar 8, 2024, at 12:40 PM, Tomas Vondra wrote:
> On 3/8/24 15:50, Gayatri Singh wrote:
> > Hello Team,
> > 
> > Can you help me with steps to identify transactions which caused wal
> > generation to surge ?
> > 
> 
> You should probably take a look at pg_waldump, which prints information
> about WAL contents, including which XID generated each record.

You can also use pg_stat_statements to obtain this information.

postgres=# select * from pg_stat_statements order by wal_bytes desc;
-[ RECORD 1 
]--+---
---
---
-
userid | 10
dbid   | 16385
toplevel   | t
queryid| -8403979585082616547
query  | UPDATE pgbench_accounts SET abalance = abalance + $1 
WHERE aid = $2
plans  | 0
total_plan_time| 0
min_plan_time  | 0
max_plan_time  | 0
mean_plan_time | 0
stddev_plan_time   | 0
calls  | 238260
total_exec_time| 4642.59929618
min_exec_time  | 0.011094
max_exec_time  | 0.872748
mean_exec_time | 0.01948543312347807
stddev_exec_time   | 0.006370786385582063
rows   | 238260
.
.
.
wal_records| 496659
wal_fpi| 19417
wal_bytes      | 208501147
.
.
.


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: speed up a logical replica setup

2024-03-06 Thread Euler Taveira
On Wed, Mar 6, 2024, at 8:24 AM, vignesh C wrote:
> Few comments:

Thanks for your review. Some changes are included in v26.

> 1)   Can we use strdup here instead of atoi, as we do similarly in
> case of pg_dump too, else we will do double conversion, convert using
> atoi and again to string while forming the connection string:
> +   case 'p':
> +   if ((opt.sub_port = atoi(optarg)) <= 0)
> +   pg_fatal("invalid subscriber
> port number");
> +   break;

I don't have a strong preference but decided to provide a patch for it. See
v26-0003.

> 2) We can have some valid range for this, else we will end up in some
> unexpected values when a higher number is specified:
> +   case 't':
> +   opt.recovery_timeout = atoi(optarg);
> +   break;

I wouldn't like to add an arbitrary value. Suggestions?

> 3) Now that we have addressed most of the items, can we handle this TODO:
> +   /*
> +* TODO use primary_conninfo (if available) from subscriber 
> and
> +* extract publisher connection string. Assume that there are
> +* identical entries for physical and logical
> replication. If there is
> +* not, we would fail anyway.
> +*/
> +   pg_log_error("no publisher connection string specified");
> +   pg_log_error_hint("Try \"%s --help\" for more
> information.", progname);
> +   exit(1);

It is not in my top priority at the moment.

> 4)  By default the log level as info here, I was not sure how to set
> it to debug level to get these error messages:
> +   pg_log_debug("publisher(%d): connection string: %s",
> i, dbinfo[i].pubconninfo);
> +   pg_log_debug("subscriber(%d): connection string: %s",
> i, dbinfo[i].subconninfo);

  -v
  --verbose
  
   
Enables verbose mode. This will cause
pg_createsubscriber to output progress 
messages
and detailed information about each step to standard error.
Repeating the option causes additional debug-level messages to appear on
standard error.
   

> 5) Currently in non verbose mode there are no messages printed on
> console, we could have a few of them printed irrespective of verbose
> or not like the following:
> a) creating publication
> b) creating replication slot
> c) waiting for the target server to reach the consistent state
> d) If pg_createsubscriber fails after this point, you must recreate
> the physical replica before continuing.
> e) creating subscription

That's the idea. Quiet mode by default.

> 6) The message should be "waiting for the target server to reach the
> consistent state":
> +#define NUM_CONN_ATTEMPTS  5
> +
> +   pg_log_info("waiting the target server to reach the consistent 
> state");
> +
> +   conn = connect_database(conninfo, true);

Fixed.


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: speed up a logical replica setup

2024-03-06 Thread Euler Taveira
still exists.
> Is there any reason? Can you clarify your opinion?

I decided to refactor the code and does what Amit Kapila suggested: use the last
replication slot LSN as the replication start point. I keep it in a separate
patch (v26-0002) to make it easier to review. I'm planning to incorporate it if
nobody objects.

> 16. others
> 
> While reading [2] and [3], I was confused the decision. You and Amit discussed
> the combination with pg_createsubscriber and slot sync and how should handle
> slots on the physical standby. You seemed to agree to remove such a slot, and
> Amit also suggested to raise an ERROR. However, you said in [8] that such
> handlings is not mandatory so should raise an WARNING in dry_run. I was quite 
> confused.
> Am I missing something?

I didn't address this item in this patch. As you pointed out, pg_resetwal does
nothing if replication slots exist. That's not an excuse for doing nothing here.
I agree that we should check this case and provide a suitable error message. If
you have a complex replication scenario, users won't be happy with this
restriction. We can always improve the UI (dropping replication slots during the
execution if an option is provided, for example).

> 17. others
> 
> Per discussion around [4], we might have to consider an if the some options 
> like
> data_directory and config_file was initially specified for standby server. 
> Another
> easy approach is to allow users to specify options like -o in pg_upgrade [5],
> which is similar to your idea. Thought?

I didn't address this item in this patch. I have a half baked patch for it. The
proposal is exactly to allow appending config_file option into -o.

pg_ctl start -o "-c config_file=/etc/postgresql/17/postgresql.conf" ...

> 
> 18. others
> 
> How do you handle the reported failure [6]?

It is a PEBCAK. I don't see an easy way to detect the scenario 1. In the current
mode, we are susceptible to this human failure. The base backup support, won't
allow it. Regarding scenario 2, the referred error is the way to capture this
wrong command line. Do you expect a different message?

> 19. main
> ```
> char*pub_base_conninfo = NULL;
> char*sub_base_conninfo = NULL;
> char*dbname_conninfo = NULL;
> ```
> 
> No need to initialize pub_base_conninfo and sub_base_conninfo.
> These variables would not be free'd.

Changed.

> 20. others
> 
> IIUC, slot creations would not be finished if there are prepared transactions.
> Should we detect it on the verification phase and raise an ERROR?

Maybe. If we decide to do it, we should also check all cases not just prepared
transactions. The other option is to add a sentence into the documentation.

> 21. others
> 
> As I said in [7], the catch up would not be finished if long 
> recovery_min_apply_delay
> is used. Should we overwrite during the catch up?

No. If the time-delayed logical replica [2] was available, I would say that we
could use the apply delay for the logical replica. The user can expect that the
replica will continue to have the configured apply delay but that's not the case
if it silently ignore it. I'm not sure if an error is appropriate in this case 
because it requires an extra step. Another option is to print a message saying
there is an apply delay. In dry run mode, user can detect this case and make a
decision. Does it seem reasonable?

> 22. pg_createsubscriber.sgml
> ```
> 
>  Check
>  Write recovery parameters into the target data...
> ```
> 
> Not sure, but "Check" seems not needed.

It was a typo. Fixed.


[1] 
https://www.postgresql.org/message-id/CALDaNm215LodC48p7LmfAsuCq9m33CWtcag2%2B9DiyNfWGuL_KQ%40mail.gmail.com
[2] 
https://www.postgresql.org/message-id/f026292b-c9ee-472e-beaa-d32c5c3a2ced%40www.fastmail.com

--
Euler Taveira
EDB   https://www.enterprisedb.com/


v26-0001-pg_createsubscriber-creates-a-new-logical-replic.patch.gz
Description: application/gzip


v26-0002-Use-latest-replication-slot-position-as-replicat.patch.gz
Description: application/gzip


v26-0003-port-replace-int-with-string.patch.gz
Description: application/gzip


Re: speed up a logical replica setup

2024-03-05 Thread Euler Taveira
On Tue, Mar 5, 2024, at 12:48 AM, Shubham Khanna wrote:
> I applied the v25 patch and did RUN the 'pg_createsubscriber' command.
> I was unable to execute it and experienced the following error:
> 
> $ ./pg_createsubscriber -D node2/ -P "host=localhost port=5432
> dbname=postgres"  -d postgres -d db1 -p 9000 -r
> ./pg_createsubscriber: invalid option -- 'p'
> pg_createsubscriber: hint: Try "pg_createsubscriber --help" for more
> information.

Oops. Good catch! I will post an updated patch soon.


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: Avoiding inadvertent debugging mode for pgbench

2024-03-01 Thread Euler Taveira
On Fri, Mar 1, 2024, at 8:07 PM, Tomas Vondra wrote:
> On 3/1/24 23:41, Nathan Bossart wrote:
> > 
> > I think this is a generally reasonable proposal, except I don't know
> > whether this breakage is acceptable.  AFAICT there are two fundamental
> > behavior changes folks would observe:
> > 
> > * "-d " would cease to emit the debugging output, and while
> >   enabling debug mode might've been unintentional in most cases, it might
> >   actually have been intentional in others.
> > 
> 
> I think this is the more severe of the two issues, because it's a silent
> change. Everything will seem to work, but the user won't get the debug
> info (if they actually wanted it).

Indeed. Hopefully the user will notice soon when inspecting the standard error
output.

> > * "-d" with no argument or with a following option would begin failing, and
> >   users would need to replace "-d" with "--debug".
> > 
> 
> I think this is fine.

Yeah. It will force the user to fix it immediately.

> > Neither of these seems particularly severe to me, especially for a
> > benchmarking program, but I'd be curious to hear what others think.
> > 
> 
> I agree the -d option may be confusing, but is it worth it? I don't
> know, it depends on how often people actually get confused by it, and I
> don't recall hitting this (nor hearing about others). To be honest I
> didn't even realize pgbench even has a debug switch ...

I'm the one that has a habit to use -d to specify the database name. I
generally include -d for pgbench and then realized that I don't need the debug
information because it is not for database specification.

> But I'd like to mention this is far from our only tool using "-d" to
> enable debug mode. A quick git-grep shows postgres, initdb,
> pg_archivecleanup and pg_combinebackup do the same thing. So maybe it's
> not that inconsistent overall.

As Greg said none of these programs connects to the database.

I don't like to break backward compatibility but in this case I suspect that it
is ok. I don't recall the last time I saw a script that makes use of -d option.
How often do you need a pgbench debug information?


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: speed up a logical replica setup

2024-03-01 Thread Euler Taveira
On Thu, Feb 22, 2024, at 12:45 PM, Hayato Kuroda (Fujitsu) wrote:
> Based on idea from Euler, I roughly implemented. Thought?
> 
> 0001-0013 were not changed from the previous version.
> 
> V24-0014: addressed your comment in the replied e-mail.
> V24-0015: Add disconnect_database() again, per [3]
> V24-0016: addressed your comment in [4].
> V24-0017: addressed your comment in [5].
> V24-0018: addressed your comment in [6].

Thanks for your review. I'm attaching v25 that hopefully addresses all pending
points.

Regarding your comments [1] on v21, I included changes for almost all items
except 2, 20, 23, 24, and 25. It still think item 2 is not required because
pg_ctl will provide a suitable message. I decided not to rearrange the block of
SQL commands (item 20) mainly because it would avoid these objects on node_f.
Do we really need command_checks_all? Depending on the output it uses
additional cycles than command_ok.

In summary:

v24-0002: documentation is updated. I didn't apply this patch as-is. Instead, I
checked what you wrote and fix some gaps in what I've been written.
v24-0003: as I said I don't think we need to add it, however, I won't fight
against it if people want to add this check.
v24-0004: I spent some time on it. This patch is not completed. I cleaned it up
and include the start_standby_server code. It starts the server using the
specified socket directory, port and username, hence, preventing external client
connections during the execution.
v24-0005: partially applied
v24-0006: applied with cosmetic change
v24-0007: applied with cosmetic change
v24-0008: applied
v24-0009: applied with cosmetic change
v24-0010: not applied. Base on #15, I refactored this code a bit. pg_fatal is
not used when there is a database connection open. Instead, pg_log_error()
followed by disconnect_database(). In cases that it should exit immediately,
disconnect_database() has a new parameter (exit_on_error) that controls if it
needs to call exit(1). I go ahead and did the same for connect_database().
v24-0011: partially applied. I included some of the suggestions (18, 19, and 
21).
v24-0012: not applied. Under reflection, after working on v24-0004, the target
server will start with new parameters (that only accepts local connections),
hence, during the execution is not possible anymore to detect if the target
server is a primary for another server. I added a sentence for it in the
documentation (Warning section).
v24-0013: good catch. Applied.
v24-0014: partially applied. After some experiments I decided to use a small
number of attempts. The current code didn't reset the counter if the connection
is reestablished. I included the documentation suggestion. I didn't include the
IF EXISTS in the DROP PUBLICATION because it doesn't solve the issue. Instead,
I refactored the drop_publication() to not try again if the DROP PUBLICATION
failed.
v24-0015: not applied. I refactored the exit code to do the right thing: print
error message, disconnect database (if applicable) and exit.
v24-0016: not applied. But checked if the information was presented in the
documentation; it is.
v24-0017: good catch. Applied.
v24-0018: not applied.

I removed almost all boolean return and include the error logic inside the
function. It reads better. I also changed the connect|disconnect_database
functions to include the error logic inside it. There is a new parameter
on_error_exit for it. I removed the action parameter from pg_ctl_status() -- I
think someone suggested it -- and the error message was moved to outside the
function. I improved the cleanup routine. It provides useful information if it
cannot remove the object (publication or replication slot) from the primary.

Since I applied v24-0004, I realized that extra start / stop service are
required. It mean pg_createsubscriber doesn't start the transformation with the
current standby settings. Instead, it stops the standby if it is running and
start it with the provided command-line options (socket, port,
listen_addresses). It has a few drawbacks:
* See v34-0012. It cannot detect if the target server is a primary for another
  server. It is documented.
* I also removed the check for standby is running. If the standby was stopped a
  long time ago, it will take some time to reach the start point.
* Dry run mode has to start / stop the service to work correctly. Is it an
  issue?

However, I decided to include --retain option, I'm thinking about to remove it.
If the logging is enabled, the information during the pg_createsubscriber will
be available. The client log can be redirected to a file for future inspection.

Comments?


[1] 
https://www.postgresql.org/message-id/TYCPR01MB12077756323B79042F29DDAEDF54C2%40TYCPR01MB12077.jpnprd01.prod.outlook.com


--
Euler Taveira
EDB   https://www.enterprisedb.com/
From 41f222b68ab4e365be0123075d54d5da8468bcd2 Mon Sep 17 00:00:00 2001
From: Euler Taveira 
Date: Mon, 5 Jun 2023 14:39:40 -0400
Subject: [PATCH v25] p

Re: speed up a logical replica setup

2024-02-22 Thread Euler Taveira
On Wed, Feb 21, 2024, at 5:00 AM, Shlok Kyal wrote:
> I found some issues and fixed those issues with top up patches
> v23-0012 and v23-0013
> 1.
> Suppose there is a cascade physical replication node1->node2->node3.
> Now if we run pg_createsubscriber with node1 as primary and node2 as
> standby, pg_createsubscriber will be successful but the connection
> between node2 and node3 will not be retained and log og node3 will
> give error:
> 2024-02-20 12:32:12.340 IST [277664] FATAL:  database system
> identifier differs between the primary and standby
> 2024-02-20 12:32:12.340 IST [277664] DETAIL:  The primary's identifier
> is 7337575856950914038, the standby's identifier is
> 7337575783125171076.
> 2024-02-20 12:32:12.341 IST [277491] LOG:  waiting for WAL to become
> available at 0/3000F10
> 
> To fix this I am avoiding pg_createsubscriber to run if the standby
> node is primary to any other server.
> Made the change in v23-0012 patch

IIRC we already discussed the cascading replication scenario. Of course,
breaking a node is not good that's why you proposed v23-0012. However,
preventing pg_createsubscriber to run if there are standbys attached to it is
also annoying. If you don't access to these hosts you need to (a) kill
walsender (very fragile / unstable), (b) start with max_wal_senders = 0 or (3)
add a firewall rule to prevent that these hosts do not establish a connection
to the target server. I wouldn't like to include the patch as-is. IMO we need
at least one message explaining the situation to the user, I mean, add a hint
message.  I'm resistant to a new option but probably a --force option is an
answer. There is no test coverage for it. I adjusted this patch (didn't include
the --force option) and add a test case.

> 2.
> While checking 'max_replication_slots' in 'check_publisher' function,
> we are not considering the temporary slot in the check:
> +   if (max_repslots - cur_repslots < num_dbs)
> +   {
> +   pg_log_error("publisher requires %d replication slots, but
> only %d remain",
> +num_dbs, max_repslots - cur_repslots);
> +   pg_log_error_hint("Consider increasing max_replication_slots
> to at least %d.",
> + cur_repslots + num_dbs);
> +   return false;
> +   }
> Fixed this in v23-0013

Good catch!

Both are included in the next patch.


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: speed up a logical replica setup

2024-02-22 Thread Euler Taveira
On Thu, Feb 22, 2024, at 9:43 AM, Hayato Kuroda (Fujitsu) wrote:
> > The possible solution would be
> > 1) allow to run pg_createsubscriber if standby is initially stopped .
> > I observed that pg_logical_createsubscriber also uses this approach.
> > 2) read GUCs via SHOW command and restore them when server restarts
> >

3. add a config-file option. That's similar to what pg_rewind does. I expect
that Debian-based installations will have this issue.

> I also prefer the first solution. 
> Another reason why the standby should be stopped is for backup purpose.
> Basically, the standby instance should be saved before running 
> pg_createsubscriber.
> An easiest way is hard-copy, and the postmaster should be stopped at that 
> time.
> I felt it is better that users can run the command immediately later the 
> copying.
> Thought?

It was not a good idea if you want to keep the postgresql.conf outside PGDATA.
I mean you need extra steps that can be error prone (different settings between
files).

Shlok, I didn't read your previous email carefully. :-/


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: speed up a logical replica setup

2024-02-19 Thread Euler Taveira
On Mon, Feb 19, 2024, at 7:22 AM, Shlok Kyal wrote:
> I have reviewed the v21 patch. And found an issue.
> 
> Initially I started the standby server with a new postgresql.conf file
> (not the default postgresql.conf that is present in the instance).
> pg_ctl -D ../standby start -o "-c config_file=/new_path/postgresql.conf"
> 
> And I have made 'max_replication_slots = 1' in new postgresql.conf and
> made  'max_replication_slots = 0' in the default postgresql.conf file.
> Now when we run pg_createsubscriber on standby we get error:
> pg_createsubscriber: error: could not set replication progress for the
> subscription "pg_createsubscriber_5_242843": ERROR:  cannot query or
> manipulate replication origin when max_replication_slots = 0

That's by design. See [1]. The max_replication_slots parameter is used as the
maximum number of subscriptions on the server.

> NOTICE:  dropped replication slot "pg_createsubscriber_5_242843" on publisher
> pg_createsubscriber: error: could not drop publication
> "pg_createsubscriber_5" on database "postgres": ERROR:  publication
> "pg_createsubscriber_5" does not exist
> pg_createsubscriber: error: could not drop replication slot
> "pg_createsubscriber_5_242843" on database "postgres": ERROR:
> replication slot "pg_createsubscriber_5_242843" does not exist

That's a bug and should be fixed.

[1] 
https://www.postgresql.org/docs/current/runtime-config-replication.html#RUNTIME-CONFIG-REPLICATION-SUBSCRIBER


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: speed up a logical replica setup

2024-02-19 Thread Euler Taveira
On Mon, Feb 19, 2024, at 6:47 AM, Peter Eisentraut wrote:
> Some review of the v21 patch:

Thanks for checking.

> - commit message
> 
> Mention pg_createsubscriber in the commit message title.  That's the 
> most important thing that someone doing git log searches in the future 
> will be looking for.

Right. Done.

> - doc/src/sgml/ref/allfiles.sgml
> 
> Move the new entry to alphabetical order.

Done.

> 
> - doc/src/sgml/ref/pg_createsubscriber.sgml
> 
> +  
> +   The pg_createsubscriber should be run at 
> the target
> +   server. The source server (known as publisher server) should accept 
> logical
> +   replication connections from the target server (known as subscriber 
> server).
> +   The target server should accept local logical replication connection.
> +  
> 
> "should" -> "must" ?

Done.

> + 
> +  Options
> 
> Sort options alphabetically.

Done.

> It would be good to indicate somewhere which options are mandatory.

I'll add this information in the option description. AFAICT the synopsis kind
of indicates it.

> + 
> +  Examples
> 
> I suggest including a pg_basebackup call into this example, so it's
> easier for readers to get the context of how this is supposed to be
> used.  You can add that pg_basebackup in this example is just an example 
> and that other base backups can also be used.

We can certainly add it but creating a standby isn't out of scope here? I will
make sure to include references to pg_basebackup and the "Setting up a Standby
Server" section.

> - doc/src/sgml/reference.sgml
> 
> Move the new entry to alphabetical order.

Done.

> - src/bin/pg_basebackup/Makefile
> 
> Move the new sections to alphabetical order.

Done.

> - src/bin/pg_basebackup/meson.build
> 
> Move the new sections to alphabetical order.

Done.

> 
> - src/bin/pg_basebackup/pg_createsubscriber.c
> 
> +typedef struct CreateSubscriberOptions
> +typedef struct LogicalRepInfo
> 
> I think these kinds of local-use struct don't need to be typedef'ed.
> (Then you also don't need to update typdefs.list.)

Done.

> +static void
> +usage(void)
> 
> Sort the options alphabetically.

Are you referring to s/options/functions/?

> +static char *
> +get_exec_path(const char *argv0, const char *progname)
> 
> Can this not use find_my_exec() and find_other_exec()?

It is indeed using it. I created this function because it needs to run the same
code path twice (pg_ctl and pg_resetwal).

> +int
> +main(int argc, char **argv)
> 
> Sort the options alphabetically (long_options struct, getopt_long()
> argument, switch cases).

Done.

> - .../t/040_pg_createsubscriber.pl
> - .../t/041_pg_createsubscriber_standby.pl
> 
> These two files could be combined into one.

Done.

> +# Force it to initialize a new cluster instead of copying a
> +# previously initdb'd cluster.
> 
> Explain why?

Ok. It needs a new cluster because it will have a different system identifier
so we can make sure the target cluster is a copy of the source server.

> +$node_s->append_conf(
> + 'postgresql.conf', qq[
> +log_min_messages = debug2
> 
> Is this setting necessary for the test?

No. It is here as a debugging aid. Better to include it in a separate patch.
There are a few messages that I don't intend to include in the final patch.

All of these modifications will be included in the next patch. I'm finishing to
integrate patches proposed by Hayato [1] and some additional fixes and
refactors.


[1] 
https://www.postgresql.org/message-id/TYCPR01MB12077A8421685E5515DE408EEF5512%40TYCPR01MB12077.jpnprd01.prod.outlook.com


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: speed up a logical replica setup

2024-02-15 Thread Euler Taveira
better plan?

v20-0012: I applied a different patch to accomplish the same thing. I included
a refactor around pg_is_in_recovery() function to be used in other 2 points.

Besides that, I changed some SQL commands to avoid having superfluous
whitespace in it. I also added a test for cascaded replication scenario. And
clean up 041 test a bit.

I didn't provide an updated documentation because I want to check v20-0002. It
is on my list to check v20-0009.


--
Euler Taveira
EDB   https://www.enterprisedb.com/
From 0073e1f7ea5b1c225e773707cbbe67cb1593823e Mon Sep 17 00:00:00 2001
From: Euler Taveira 
Date: Mon, 5 Jun 2023 14:39:40 -0400
Subject: [PATCH v21] Creates a new logical replica from a standby server

A new tool called pg_createsubscriber can convert a physical replica
into a logical replica. It runs on the target server and should be able
to connect to the source server (publisher) and the target server
(subscriber).

The conversion requires a few steps. Check if the target data directory
has the same system identifier than the source data directory. Stop the
target server if it is running as a standby server. Create one
replication slot per specified database on the source server. One
additional replication slot is created at the end to get the consistent
LSN (This consistent LSN will be used as (a) a stopping point for the
recovery process and (b) a starting point for the subscriptions). Write
recovery parameters into the target data directory and start the target
server (Wait until the target server is promoted). Create one
publication (FOR ALL TABLES) per specified database on the source
server. Create one subscription per specified database on the target
server (Use replication slot and publication created in a previous step.
Don't enable the subscriptions yet). Sets the replication progress to
the consistent LSN that was got in a previous step. Enable the
subscription for each specified database on the target server.  Stop the
target server. Change the system identifier from the target server.

Depending on your workload and database size, creating a logical replica
couldn't be an option due to resource constraints (WAL backlog should be
available until all table data is synchronized). The initial data copy
and the replication progress tends to be faster on a physical replica.
The purpose of this tool is to speed up a logical replica setup.
---
 doc/src/sgml/ref/allfiles.sgml|1 +
 doc/src/sgml/ref/pg_createsubscriber.sgml |  320 +++
 doc/src/sgml/reference.sgml   |1 +
 src/bin/pg_basebackup/.gitignore  |1 +
 src/bin/pg_basebackup/Makefile|8 +-
 src/bin/pg_basebackup/meson.build |   19 +
 src/bin/pg_basebackup/pg_createsubscriber.c   | 1972 +
 .../t/040_pg_createsubscriber.pl  |   39 +
 .../t/041_pg_createsubscriber_standby.pl  |  217 ++
 src/tools/pgindent/typedefs.list  |2 +
 10 files changed, 2579 insertions(+), 1 deletion(-)
 create mode 100644 doc/src/sgml/ref/pg_createsubscriber.sgml
 create mode 100644 src/bin/pg_basebackup/pg_createsubscriber.c
 create mode 100644 src/bin/pg_basebackup/t/040_pg_createsubscriber.pl
 create mode 100644 src/bin/pg_basebackup/t/041_pg_createsubscriber_standby.pl

diff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml
index 4a42999b18..a2b5eea0e0 100644
--- a/doc/src/sgml/ref/allfiles.sgml
+++ b/doc/src/sgml/ref/allfiles.sgml
@@ -214,6 +214,7 @@ Complete list of usable sgml source files in this directory.
 
 
 
+
 
 
 
diff --git a/doc/src/sgml/ref/pg_createsubscriber.sgml b/doc/src/sgml/ref/pg_createsubscriber.sgml
new file mode 100644
index 00..f5238771b7
--- /dev/null
+++ b/doc/src/sgml/ref/pg_createsubscriber.sgml
@@ -0,0 +1,320 @@
+
+
+
+ 
+  pg_createsubscriber
+ 
+
+ 
+  pg_createsubscriber
+  1
+  Application
+ 
+
+ 
+  pg_createsubscriber
+  convert a physical replica into a new logical replica
+ 
+
+ 
+  
+   pg_createsubscriber
+   option
+   
+
+ -D 
+ --pgdata
+
+datadir
+
+ -P
+ --publisher-server
+
+connstr
+
+ -S
+ --subscriber-server
+
+connstr
+
+ -d
+ --database
+
+dbname
+   
+  
+ 
+
+ 
+  Description
+  
+pg_createsubscriber creates a new logical
+replica from a physical standby server.
+  
+
+  
+   The pg_createsubscriber should be run at the target
+   server. The source server (known as publisher server) should accept logical
+   replication connections from the target server (known as subscriber server).
+   The target server should accept local logical replication connection.
+  
+ 
+
+ 
+  Options
+
+   
+pg_createsubscriber accepts the following
+command-line arguments:
+
+
+ 
+  -D directory
+  --pgdata=directory
+  
+   
+The target directory that contains a cluster directory from a physical
+replica

Re: About a recently-added message

2024-02-14 Thread Euler Taveira
On Wed, Feb 14, 2024, at 8:45 AM, Amit Kapila wrote:
> Now, I am less clear about whether to quote "logical" or not in the
> above message. Do you have any suggestions?

The possible confusion comes from the fact that the sentence contains "must be"
in the middle of a comparison expression. For pg_createsubscriber, we are using

publisher requires wal_level >= logical

I suggest to use something like

slot synchronization requires wal_level >= logical


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: speed up a logical replica setup

2024-02-13 Thread Euler Taveira
On Fri, Feb 9, 2024, at 3:27 AM, vignesh C wrote:
> On Wed, 7 Feb 2024 at 10:24, Euler Taveira  wrote:
> >
> > On Fri, Feb 2, 2024, at 6:41 AM, Hayato Kuroda (Fujitsu) wrote:
> >
> > Thanks for updating the patch!
> 
> Thanks for the updated patch, few comments:
> Few comments:
> 1) Cleanup function handler flag should be reset, i.e.
> dbinfo->made_replslot = false; should be there else there will be an
> error during  drop replication slot cleanup in error flow:

Why? drop_replication_slot() is basically called by atexit().

> 2) Cleanup function handler flag should be reset, i.e.
> dbinfo->made_publication = false; should be there else there will be
> an error during drop publication cleanup in error flow:

Ditto. drop_publication() is basically called by atexit().

> 
> 3) Cleanup function handler flag should be reset, i.e.
> dbinfo->made_subscription = false; should be there else there will be
> an error during drop publication cleanup in error flow:

Ditto. drop_subscription() is only called by atexit().

> 4) I was not sure if drop_publication is required here, as we will not
> create any publication in subscriber node:
> +   if (dbinfo[i].made_subscription)
> +   {
> +   conn = connect_database(dbinfo[i].subconninfo);
> +   if (conn != NULL)
> +   {
> +   drop_subscription(conn, [i]);
> +   if (dbinfo[i].made_publication)
> +   drop_publication(conn, [i]);
> +   disconnect_database(conn);
> +   }
> +   }

setup_subscriber() explains the reason.

/*
 * Since the publication was created before the consistent LSN, it is
 * available on the subscriber when the physical replica is promoted.
 * Remove publications from the subscriber because it has no use.
 */
drop_publication(conn, [i]);

I changed the referred code a bit because it is not reliable. Since
made_subscription was not set until we create the subscription, the
publications that were created on primary and replicated to standby won't be
removed on subscriber. Instead, it should rely on the recovery state to decide
if it should drop it.

> 5) The connection should be disconnected in case of error case:
> +   /* secure search_path */
> +   res = PQexec(conn, ALWAYS_SECURE_SEARCH_PATH_SQL);
> +   if (PQresultStatus(res) != PGRES_TUPLES_OK)
> +   {
> +   pg_log_error("could not clear search_path: %s",
> PQresultErrorMessage(res));
> +   return NULL;
> +   }
> +   PQclear(res);

connect_database() is usually followed by a NULL test and exit(1) if it cannot
connect. It should be added for correctness but it is not a requirement.

> 7) These includes are not required:
> 7.a) #include 
> 7.b) #include 
> 7.c) #include 
> 7.d) #include "access/xlogdefs.h"
> 7.e) #include "catalog/pg_control.h"
> 7.f) #include "common/file_utils.h"
> 7.g) #include "utils/pidfile.h"

Good catch. I was about to review the include files.

> 8) POSTMASTER_STANDBY and POSTMASTER_FAILED are not being used, is it
> required or kept for future purpose:
> +enum WaitPMResult
> +{
> +   POSTMASTER_READY,
> +   POSTMASTER_STANDBY,
> +   POSTMASTER_STILL_STARTING,
> +   POSTMASTER_FAILED
> +};

I just copied verbatim from pg_ctl. We should remove the superfluous states.

> 9) pg_createsubscriber should be kept after pg_basebackup to maintain
> the consistent order:

Ok.

> 
> 10) dry-run help message is not very clear, how about something
> similar to pg_upgrade's message like "check clusters only, don't
> change any data":

$ /tmp/pgdevel/bin/pg_archivecleanup --help | grep dry-run
  -n, --dry-run   dry run, show the names of the files that would be
$ /tmp/pgdevel/bin/pg_combinebackup --help | grep dry-run
  -n, --dry-run don't actually do anything
$ /tmp/pgdevel/bin/pg_resetwal --help | grep dry-run
  -n, --dry-run  no update, just show what would be done
$ /tmp/pgdevel/bin/pg_rewind --help | grep dry-run
  -n, --dry-run      stop before modifying anything
$ /tmp/pgdevel/bin/pg_upgrade --help | grep check  
  -c, --check   check clusters only, don't change any data

I used the same sentence as pg_rewind but I'm fine with pg_upgrade or
pg_combinebackup sentences.


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: speed up a logical replica setup

2024-02-13 Thread Euler Taveira
On Thu, Feb 8, 2024, at 12:04 AM, Hayato Kuroda (Fujitsu) wrote:
> >
> Remember the target server was a standby (read only access). I don't expect an
> application trying to modify it; unless it is a buggy application.
> >
> 
> What if the client modifies the data just after the promotion?
> Naively considered, all the changes can be accepted, but are there any issues?

If someone modifies data after promotion, fine; she has to deal with conflicts,
if any. IMO it is solved adding one or two sentences in the documentation.

> >
> Regarding
> GUCs, almost all of them is PGC_POSTMASTER (so it cannot be modified unless 
> the
> server is restarted). The ones that are not PGC_POSTMASTER, does not affect 
> the
> pg_createsubscriber execution [1].
> >
> 
> IIUC,  primary_conninfo and primary_slot_name is PGC_SIGHUP.

Ditto.

> >
> I'm just pointing out that this case is a different from pg_upgrade (from 
> which
> this idea was taken). I'm not saying that's a bad idea. I'm just arguing that
> you might be preventing some access read only access (monitoring) when it is
> perfectly fine to connect to the database and execute queries. As I said
> before, the current UI allows anyone to setup the standby to accept only local
> connections. Of course, it is an extra step but it is possible. However, once
> you apply v16-0007, there is no option but use only local connection during 
> the
> transformation. Is it an acceptable limitation?
> >
> 
> My remained concern is written above. If they do not problematic we may not 
> have
> to restrict them for now. At that time, changes 
> 
> 1) overwriting a port number,
> 2) setting listen_addresses = ''

It can be implemented later if people are excited by it.

> are not needed, right? IIUC inconsistency of -P may be still problematic.

I still think we shouldn't have only the transformed primary_conninfo as
option.

> >
> pglogical_create_subscriber does nothing [2][3].
> >
> 
> Oh, thanks.
> Just to confirm - pglogical set shared_preload_libraries to '', should we 
> follow or not?

The in-core logical replication does not require any library to be loaded.


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: speed up a logical replica setup

2024-02-07 Thread Euler Taveira
On Wed, Feb 7, 2024, at 2:31 AM, Hayato Kuroda (Fujitsu) wrote:
> Ah, actually I did not have such a point of view. Assuming that changed port 
> number
> can avoid connection establishments, there are four options:
> a) Does not overwrite port and listen_addresses. This allows us to monitor by
>external agents, but someone can modify GUCs and data during operations.
> b) Overwrite port but do not do listen_addresses. Not sure it is useful... 
> c) Overwrite listen_addresses but do not do port. This allows us to monitor by
>local agents, and we can partially protect the database. But there is 
> still a 
>room.
> d) Overwrite both port and listen_addresses. This can protect databases 
> perfectly
> but no one can monitor.

Remember the target server was a standby (read only access). I don't expect an
application trying to modify it; unless it is a buggy application. Regarding
GUCs, almost all of them is PGC_POSTMASTER (so it cannot be modified unless the
server is restarted). The ones that are not PGC_POSTMASTER, does not affect the
pg_createsubscriber execution [1].

postgres=# select name, setting, context from pg_settings where name in 
('max_replication_slots', 'max_logical_replication_workers', 
'max_worker_processes', 'max_sync_workers_per_subscription', 
'max_parallel_apply_workers_per_subscription');
name | setting |  context   
-+-+
max_logical_replication_workers | 4   | postmaster
max_parallel_apply_workers_per_subscription | 2   | sighup
max_replication_slots   | 10  | postmaster
max_sync_workers_per_subscription   | 2   | sighup
max_worker_processes| 8   | postmaster
(5 rows)

I'm just pointing out that this case is a different from pg_upgrade (from which
this idea was taken). I'm not saying that's a bad idea. I'm just arguing that
you might be preventing some access read only access (monitoring) when it is
perfectly fine to connect to the database and execute queries. As I said
before, the current UI allows anyone to setup the standby to accept only local
connections. Of course, it is an extra step but it is possible. However, once
you apply v16-0007, there is no option but use only local connection during the
transformation. Is it an acceptable limitation?

Under reflection, I don't expect a big window

1802 /*
1803  * Start subscriber and wait until accepting connections.
1804  */
1805 pg_log_info("starting the subscriber");
1806 if (!dry_run)
1807 start_standby_server(pg_bin_dir, opt.subscriber_dir, 
server_start_log);
1808 
1809 /*
1810  * Waiting the subscriber to be promoted.
1811  */
1812 wait_for_end_recovery(dbinfo[0].subconninfo, pg_bin_dir, );
.
.
.
1845 /*
1846  * Stop the subscriber.
1847  */
1848 pg_log_info("stopping the subscriber");
1849 if (!dry_run)
1850 stop_standby_server(pg_bin_dir, opt.subscriber_dir);

... mainly because the majority of the time will be wasted in
wait_for_end_recovery() if the server takes some time to reach consistent state
(and during this phase it cannot accept connections anyway). Aren't we worrying
too much about it?

> Hmm, which one should be chosen? I prefer c) or d).
> Do you know how pglogical_create_subscriber does?

pglogical_create_subscriber does nothing [2][3].


[1] https://www.postgresql.org/docs/current/logical-replication-config.html
[2] 
https://github.com/2ndQuadrant/pglogical/blob/REL2_x_STABLE/pglogical_create_subscriber.c#L488
[3] 
https://github.com/2ndQuadrant/pglogical/blob/REL2_x_STABLE/pglogical_create_subscriber.c#L529


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: Possibility to disable `ALTER SYSTEM`

2024-02-07 Thread Euler Taveira
On Wed, Feb 7, 2024, at 10:49 AM, Jelte Fennema-Nio wrote:
> On Wed, 7 Feb 2024 at 11:35, Gabriele Bartolini
>  wrote:
> > This is mostly the approach I have taken in the patch, except allowing to 
> > change the value in the configuration file.
> 
> (I had missed the patch in the long thread). I think it would be nice
> to have this be PGC_SIGHUP, and set GUC_DISALLOW_IN_AUTO_FILE. That
> way this behaviour can be changed without shutting down postgres (but
> not with ALTER SYSTEM, because that seems confusing).

Based on Gabriele's use case (Kubernetes) and possible others like a cloud
vendor, I think it should be more restrictive not permissive. I mean,
PGC_POSTMASTER and *only* allow this GUC to be from command-line. (I don't
inspect the code but maybe setting GUC_DISALLOW_IN_FILE is not sufficient to
accomplish this goal.) The main advantages of the GUC system are (a) the
setting is dynamically assigned during startup and (b) you can get the current
setting via SQL.

Another idea is to set it per cluster during initdb like data checksums. You
don't rely on the GUC system but store this information into pg_control. I
think for the referred use cases, you will never have to change it but you can
have a mechanism to change it.


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: speed up a logical replica setup

2024-02-06 Thread Euler Taveira
> 
> Not sure noderole should be passed here. It is used only for the logging.
> Can we output string before calling the function?
> (The parameter is not needed anymore if -P is removed)

Done.

> 08.
> The terminology is still not consistent. Some functions call the target as 
> standby,
> but others call it as subscriber.

The terminology should reflect the actual server role. I'm calling it "standby"
if it is a physical replica and "subscriber" if it is a logical replica. Maybe
"standby" isn't clear enough.

> 09.
> v14 does not work if the standby server has already been set recovery_target*
> options. PSA the reproducer. I considered two approaches:
> 
> a) raise an ERROR when these parameter were set. check_subscriber() can do it
> b) overwrite these GUCs as empty strings. 

I prefer (b) that's exactly what you provided in v16-0006. 

> 10.
> The execution always fails if users execute --dry-run just before. Because
> pg_createsubscriber stops the standby anyway. Doing dry run first is quite 
> normal
> use-case, so current implementation seems not user-friendly. How should we 
> fix?
> Below bullets are my idea:
> 
> a) avoid stopping the standby in case of dry_run: seems possible.
> b) accept even if the standby is stopped: seems possible.
> c) start the standby at the end of run: how arguments like pg_ctl -l should 
> be specified?

I prefer (a). I applied a slightly modified version of v16-0005.

This new patch contains the following changes:

* check whether the target is really a standby server (0004)
* refactor: pg_create_logical_replication_slot function
* use a single variable for pg_ctl and pg_resetwal directory
* avoid recovery errors applying default settings for some GUCs (0006)
* don't stop/start the standby in dry run mode (0005)
* miscellaneous fixes

I don't understand why v16-0002 is required. In a previous version, this patch
was using connections in logical replication mode. After some discussion we
decided to change it to regular connections and use SQL functions (instead of
replication commands). Is it a requirement for v16-0003?

I started reviewing v16-0007 but didn't finish yet. The general idea is ok.
However, I'm still worried about preventing some use cases if it provides only
the local connection option. What if you want to keep monitoring this instance
while the transformation is happening? Let's say it has a backlog that will
take some time to apply. Unless, you have a local agent, you have no data about
this server until pg_createsubscriber terminates. Even the local agent might
not connect to the server unless you use the current port.


--
Euler Taveira
EDB   https://www.enterprisedb.com/
From ae5a0efe6b0056fb108c3764fe99caebc0554f76 Mon Sep 17 00:00:00 2001
From: Euler Taveira 
Date: Mon, 5 Jun 2023 14:39:40 -0400
Subject: [PATCH v17] Creates a new logical replica from a standby server

A new tool called pg_createsubscriber can convert a physical replica
into a logical replica. It runs on the target server and should be able
to connect to the source server (publisher) and the target server
(subscriber).

The conversion requires a few steps. Check if the target data directory
has the same system identifier than the source data directory. Stop the
target server if it is running as a standby server. Create one
replication slot per specified database on the source server. One
additional replication slot is created at the end to get the consistent
LSN (This consistent LSN will be used as (a) a stopping point for the
recovery process and (b) a starting point for the subscriptions). Write
recovery parameters into the target data directory and start the target
server (Wait until the target server is promoted). Create one
publication (FOR ALL TABLES) per specified database on the source
server. Create one subscription per specified database on the target
server (Use replication slot and publication created in a previous step.
Don't enable the subscriptions yet). Sets the replication progress to
the consistent LSN that was got in a previous step. Enable the
subscription for each specified database on the target server.  Stop the
target server. Change the system identifier from the target server.

Depending on your workload and database size, creating a logical replica
couldn't be an option due to resource constraints (WAL backlog should be
available until all table data is synchronized). The initial data copy
and the replication progress tends to be faster on a physical replica.
The purpose of this tool is to speed up a logical replica setup.
---
 doc/src/sgml/ref/allfiles.sgml|1 +
 doc/src/sgml/ref/pg_createsubscriber.sgml |  320 +++
 doc/src/sgml/reference.sgml   |1 +
 src/bin/pg_basebackup/.gitignore  |1 +
 src/bin/pg_basebackup/Makefile|8 +-
 src/bin/pg_basebackup/meson.build   

Re: speed up a logical replica setup

2024-02-01 Thread Euler Taveira
On Thu, Feb 1, 2024, at 9:47 AM, Hayato Kuroda (Fujitsu) wrote:
> I made fix patches to solve reported issues by Fabrízio.
> 
> * v13-0001: Same as v11-0001 made by Euler.
> * v13-0002: Fixes ERRORs while dropping replication slots [1].
>If you want to see codes which we get agreement, please apply 
> until 0002.
> === experimental patches ===
> * v13-0003: Avoids to use replication connections. The issue [2] was solved 
> on my env.
> * v13-0004: Removes -P option and use primary_conninfo instead.
> * v13-0005: Refactors data structures

Thanks for rebasing the proposed patches. I'm attaching a new patch.

As I said in the previous email [1] I fixed some issues from your previous 
review:

* use pg_fatal() if possible. There are some cases that it couldn't replace
  pg_log_error() + exit(1) because it requires a hint.
* pg_resetwal output. Send standard output to /dev/null to avoid extra message.
* check privileges. Make sure the current role can execute CREATE SUBSCRIPTION
  and pg_replication_origin_advance().
* log directory. Refactor code that setup the log file used as server log.
* run with restricted token (Windows).
* v13-0002. Merged.
* v13-0003. I applied a modified version. I returned only the required
  information for each query.
* group command-line options into a new struct CreateSubscriberOptions. The
  exception is the dry_run option.
* rename functions that obtain system identifier.

WIP

I'm still working on the data structures to group options. I don't like the way
it was grouped in v13-0005. There is too many levels to reach database name.
The setup_subscriber() function requires the 3 data structures.

The documentation update is almost there. I will include the modifications in
the next patch.

Regarding v13-0004, it seems a good UI that's why I wrote a comment about it.
However, it comes with a restriction that requires a similar HBA rule for both
regular and replication connections. Is it an acceptable restriction? We might
paint ourselves into the corner. A reasonable proposal is not to remove this
option. Instead, it should be optional. If it is not provided, primary_conninfo
is used.

[1] 
https://www.postgresql.org/message-id/80ce3f65-7ca3-471e-b1a4-24ac529ff4ea%40app.fastmail.com


--
Euler Taveira
EDB   https://www.enterprisedb.com/
From 2666b5f660c64d699a0be440c3701c7be00ec309 Mon Sep 17 00:00:00 2001
From: Euler Taveira 
Date: Mon, 5 Jun 2023 14:39:40 -0400
Subject: [PATCH v14] Creates a new logical replica from a standby server

A new tool called pg_createsubscriber can convert a physical replica into a
logical replica. It runs on the target server and should be able to
connect to the source server (publisher) and the target server
(subscriber).

The conversion requires a few steps. Check if the target data directory
has the same system identifier than the source data directory. Stop the
target server if it is running as a standby server. Create one
replication slot per specified database on the source server. One
additional replication slot is created at the end to get the consistent
LSN (This consistent LSN will be used as (a) a stopping point for the
recovery process and (b) a starting point for the subscriptions). Write
recovery parameters into the target data directory and start the target
server (Wait until the target server is promoted). Create one
publication (FOR ALL TABLES) per specified database on the source
server. Create one subscription per specified database on the target
server (Use replication slot and publication created in a previous step.
Don't enable the subscriptions yet). Sets the replication progress to
the consistent LSN that was got in a previous step. Enable the
subscription for each specified database on the target server.
Stop the target server. Change the system identifier from the target
server.

Depending on your workload and database size, creating a logical replica
couldn't be an option due to resource constraints (WAL backlog should be
available until all table data is synchronized). The initial data copy
and the replication progress tends to be faster on a physical replica.
The purpose of this tool is to speed up a logical replica setup.
---
 doc/src/sgml/ref/allfiles.sgml|1 +
 doc/src/sgml/ref/pg_createsubscriber.sgml |  320 +++
 doc/src/sgml/reference.sgml   |1 +
 src/bin/pg_basebackup/.gitignore  |1 +
 src/bin/pg_basebackup/Makefile|8 +-
 src/bin/pg_basebackup/meson.build |   19 +
 src/bin/pg_basebackup/pg_createsubscriber.c   | 1876 +
 .../t/040_pg_createsubscriber.pl  |   44 +
 .../t/041_pg_createsubscriber_standby.pl  |  139 ++
 src/tools/pgindent/typedefs.list  |2 +
 10 files changed, 2410 insertions(+), 1 deletion(-)
 create mode 100644 doc/src/sgml/ref/pg_createsubscriber.sgml
 create mode 100644 src/bin/pg_basebackup/pg_createsubscri

Re: speed up a logical replica setup

2024-01-31 Thread Euler Taveira
On Wed, Jan 31, 2024, at 11:09 PM, Hayato Kuroda (Fujitsu) wrote:
> >
> Why? Are you suggesting that the dry run mode covers just the verification
> part? If so, it is not a dry run mode. I would expect it to run until the end
> (or until it accomplish its goal) but *does not* modify data. For pg_resetwal,
> the modification is one of the last steps and the other ones (KillFoo
> functions) that are skipped modify data. It ends the dry run mode when it
> accomplish its goal (obtain the new control data values). If we stop earlier,
> some of the additional steps won't be covered by the dry run mode and a 
> failure
> can happen but could be detected if you run a few more steps.
> >
> 
> Yes, it was my expectation. I'm still not sure which operations can detect by 
> the
> dry_run, but we can keep it for now.

The main goal is to have information for troubleshooting.

> 
> Good point. I included a check for pg_create_subscription role and CREATE
> privilege on the specified database.
> >
> 
> Not sure, but can we do the replication origin functions by these privilege?
> According to the doc[1], these ones seem not to be related.

Hmm. No. :( Better add this check too.


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: Synchronizing slots from primary to standby

2024-01-31 Thread Euler Taveira
On Mon, Jan 29, 2024, at 10:17 AM, Zhijie Hou (Fujitsu) wrote:
> Attach the V72-0001 which addressed above comments, other patches will be
> rebased and posted after pushing first patch. Thanks Shveta for helping 
> address
> the comments.

While working on another patch I noticed a new NOTICE message:

NOTICE:  changed the failover state of replication slot "foo" on publisher to 
false

I wasn't paying much attention to this thread then I start reading the 2
patches that was recently committed. The message above surprises me because
pg_createsubscriber starts to emit this message. The reason is that it doesn't
create the replication slot during the CREATE SUBSCRIPTION. Instead, it creates
the replication slot with failover = false and no such option is informed
during CREATE SUBSCRIPTION which means it uses the default value (failover =
false). I expect that I don't see any message because it is *not* changing the
behavior. I was wrong. It doesn't check the failover state on publisher, it
just executes walrcv_alter_slot() and emits a message.

IMO if we are changing an outstanding property on node A from node B, node B
already knows (or might know) about that behavior change (because it is sending
the command), however, node A doesn't (unless log_replication_commands = on --
it is not the default).

Do we really need this message as NOTICE? I would set it to DEBUG1 if it is
worth or even remove it (if we consider there are other ways to obtain the same
information).


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: speed up a logical replica setup

2024-01-31 Thread Euler Taveira
 in the next patch.

> 12. main
> ```
> /* Register a function to clean up objects in case of failure. */
> atexit(cleanup_objects_atexit);
> ```
> 
> According to the manpage, callback functions would not be called when it exits
> due to signals:
> 
> > Functions  registered  using atexit() (and on_exit(3)) are not called if a
> > process terminates abnormally because of the delivery of a signal.
> 
> Do you have a good way to handle the case? One solution is to output created
> objects in any log level, but the consideration may be too much. Thought?

Nothing? If you interrupt the execution, there will be objects left behind and
you, as someone that decided to do it, have to clean things up. What do you
expect this tool to do? The documentation will provide some guidance informing
the object name patterns this tool uses and you can check for leftover objects.
Someone can argue that is a valid feature request but IMO it is not one in the
top of the list.

> 13, main
> ```
> /*
> * Create a temporary logical replication slot to get a consistent LSN.
> ```
> 
> Just to clarify - I still think the process exits before here in case of dry 
> run.
> In case of pg_resetwal, the process exits before doing actual works like
> RewriteControlFile().

Why? Are you suggesting that the dry run mode covers just the verification
part? If so, it is not a dry run mode. I would expect it to run until the end
(or until it accomplish its goal) but *does not* modify data. For pg_resetwal,
the modification is one of the last steps and the other ones (KillFoo
functions) that are skipped modify data. It ends the dry run mode when it
accomplish its goal (obtain the new control data values). If we stop earlier,
some of the additional steps won't be covered by the dry run mode and a failure
can happen but could be detected if you run a few more steps.

> 14. main
> ```
> * XXX we might not fail here. Instead, we provide a warning so the user
> * eventually drops this replication slot later.
> ```
> 
> But there are possibilities to exit(1) in drop_replication_slot(). Is it 
> acceptable?

No, there isn't.

> 15. wait_for_end_recovery
> ```
> /*
> * Bail out after recovery_timeout seconds if this option is set.
> */
> if (recovery_timeout > 0 && timer >= recovery_timeout)
> ```
> 
> Hmm, IIUC, it should be enabled by default [3]. Do you have anything in your 
> mind?

Why? See [1]. I prefer the kind mode (always wait until the recovery ends) but
you and Amit are proposing a more aggressive mode. The proposal (-t 60) seems
ok right now, however, if the goal is to provide base backup support in the
future, you certainly should have to add the --recovery-timeout in big clusters
or those with high workloads because base backup is run between replication slot
creation and consistent LSN. Of course, we can change the default when base
backup support is added.

> 16. main
> ```
> /*
> * Create the subscription for each database on subscriber. It does not
> * enable it immediately because it needs to adjust the logical
> * replication start point to the LSN reported by consistent_lsn (see
> * set_replication_progress). It also cleans up publications created by
> * this tool and replication to the standby.
> */
> if (!setup_subscriber(dbinfo, consistent_lsn))
> ```
> 
> Subscriptions would be created and replication origin would be moved forward 
> here,
> but latter one can be done only by the superuser. I felt that this should be
> checked in check_subscriber().

Good point. I included a check for pg_create_subscription role and CREATE
privilege on the specified database.

> 17. main
> ```
> /*
> * Change system identifier.
> */
> modify_sysid(pg_resetwal_path, subscriber_dir);
> ```
> 
> Even if I executed without -v option, an output from pg_resetwal command 
> appears.
> It seems bit strange.

The pg_resetwal is using a printf and there is no prefix that identifies that
message is from pg_resetwal. That's message has been bothering me for a while
so let's send it to /dev/null. I'll include it in the next patch.

RewriteControlFile();
KillExistingXLOG();
KillExistingArchiveStatus();
KillExistingWALSummaries();
WriteEmptyXLOG();

printf(_("Write-ahead log reset\n"));
return 0;


[1] 
https://www.postgresql.org/message-id/b315c7da-7ab1-4014-a2a9-8ab6ae26017c%40app.fastmail.com


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: speed up a logical replica setup

2024-01-31 Thread Euler Taveira
On Wed, Jan 31, 2024, at 11:55 AM, Fabrízio de Royes Mello wrote:
> 
> On Wed, Jan 31, 2024 at 11:35 AM Euler Taveira  wrote:
> >
> > On Wed, Jan 31, 2024, at 11:25 AM, Fabrízio de Royes Mello wrote:
> >
> > Jumping into this a bit late here... I'm trying a simple 
> > pg_createsubscriber but getting an error:
> >
> >
> > Try v11. It seems v12-0002 is not correct.
> 
> Using v11 I'm getting this error:
> 
> ~/pgsql took 22s 
> ✦ ➜ pg_createsubscriber -d fabrizio -r -D /tmp/replica5434 -S 'host=/tmp 
> port=5434' -P 'host=/tmp port=5432'
> NOTICE:  changed the failover state of replication slot 
> "pg_createsubscriber_16384_706609" on publisher to false
> pg_createsubscriber: error: could not drop replication slot 
> "pg_createsubscriber_706609_startpoint" on database "fabrizio": ERROR:  
> replication slot "pg_createsubscriber_706609_startpoint" does not exist
> Write-ahead log reset

Hmm. I didn't try it with the failover patch that was recently applied. Did you
have any special configuration on primary?


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: speed up a logical replica setup

2024-01-31 Thread Euler Taveira
On Wed, Jan 31, 2024, at 11:25 AM, Fabrízio de Royes Mello wrote:
> Jumping into this a bit late here... I'm trying a simple pg_createsubscriber 
> but getting an error:

Try v11. It seems v12-0002 is not correct.

> Seems we need to escape connection params similar we do in dblink [1]

I think it is a consequence of v12-0003. I didn't review v12 yet but although I
have added a comment saying it might be possible to use primary_conninfo, I'm
not 100% convinced that's the right direction.

/*
 * TODO use primary_conninfo (if available) from subscriber and
 * extract publisher connection string. Assume that there are
 * identical entries for physical and logical replication. If there is
 * not, we would fail anyway.
 */


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: speed up a logical replica setup

2024-01-29 Thread Euler Taveira
On Sun, Jan 28, 2024, at 10:10 PM, Euler Taveira wrote:
> On Fri, Jan 26, 2024, at 4:55 AM, Hayato Kuroda (Fujitsu) wrote:
>> Again, thanks for updating the patch! There are my random comments for v9.
> 
> Thanks for checking v9. I already incorporated some of the points below into
> the next patch. Give me a couple of hours to include some important points.

Here it is another patch that includes the following changes:

* rename the tool to pg_createsubscriber: it was the name with the most votes
  [1].
* fix recovery-timeout option [2]
* refactor: split setup_publisher into check_publisher (that contains only GUC
  checks) and setup_publisher (that does what the name suggests) [2]
* doc: verbose option can be specified multiple times [2]
* typedefs.list: add LogicalRepInfo [2]
* fix: register cleanup routine after the data structure (dbinfo) is assigned
  [2]
* doc: add mandatory options to synopsis [3]
* refactor: rename some options (such as publisher-conninfo to
  publisher-server) to use the same pattern as pg_rewind.
* feat: remove publications from subscriber [2]
* feat: use temporary replication slot to get consistent LSN [3]
* refactor: move subscriber setup to its own function
* refactor: stop standby server to its own function [2]
* refactor: start standby server to its own function [2]
* fix: getopt options [4]

There is a few open items in my list. I included v10-0002. I had already
included a refactor to include start/stop functions so I didn't include
v10-0003. I'll check v10-0004 tomorrow.

One open item that is worrying me is how to handle the pg_ctl timeout. This
patch does nothing and the user should use PGCTLTIMEOUT environment variable to
avoid that the execution is canceled after 60 seconds (default for pg_ctl).
Even if you set a high value, it might not be enough for cases like
time-delayed replica. Maybe pg_ctl should accept no timeout as --timeout
option. I'll include this caveat into the documentation but I'm afraid it is
not sufficient and we should provide a better way to handle this situation.

[1] 
https://www.postgresql.org/message-id/b315c7da-7ab1-4014-a2a9-8ab6ae26017c%40app.fastmail.com
[2] 
https://www.postgresql.org/message-id/TY3PR01MB98895A551923953B3DA3C7C8F5792%40TY3PR01MB9889.jpnprd01.prod.outlook.com
[3] 
https://www.postgresql.org/message-id/ty3pr01mb9889c362ff76102c88fa1c29f5...@ty3pr01mb9889.jpnprd01.prod.outlook.com
[4] 
https://www.postgresql.org/message-id/TY3PR01MB98891E7735141FE8760CEC4AF57E2%40TY3PR01MB9889.jpnprd01.prod.outlook.com


--
Euler Taveira
EDB   https://www.enterprisedb.com/
From 67c6e63cf34885bdd687c06e1e071d15f42cdf2a Mon Sep 17 00:00:00 2001
From: Euler Taveira 
Date: Mon, 5 Jun 2023 14:39:40 -0400
Subject: [PATCH v11] Creates a new logical replica from a standby server

A new tool called pg_createsubscriber can convert a physical replica
into a logical replica. It runs on the target server and should be able
to connect to the source server (publisher) and the target server
(subscriber).

The conversion requires a few steps. Check if the target data directory
has the same system identifier than the source data directory. Stop the
target server if it is running as a standby server. Create one
replication slot per specified database on the source server. One
additional replication slot is created at the end to get the consistent
LSN (This consistent LSN will be used as (a) a stopping point for the
recovery process and (b) a starting point for the subscriptions). Write
recovery parameters into the target data directory and start the target
server (Wait until the target server is promoted). Create one
publication (FOR ALL TABLES) per specified database on the source
server. Create one subscription per specified database on the target
server (Use replication slot and publication created in a previous step.
Don't enable the subscriptions yet). Sets the replication progress to
the consistent LSN that was got in a previous step. Enable the
subscription for each specified database on the target server.  Stop the
target server. Change the system identifier from the target server.

Depending on your workload and database size, creating a logical replica
couldn't be an option due to resource constraints (WAL backlog should be
available until all table data is synchronized). The initial data copy
and the replication progress tends to be faster on a physical replica.
The purpose of this tool is to speed up a logical replica setup.
---
 doc/src/sgml/ref/allfiles.sgml|1 +
 doc/src/sgml/ref/pg_createsubscriber.sgml |  322 +++
 doc/src/sgml/reference.sgml   |1 +
 src/bin/pg_basebackup/.gitignore  |1 +
 src/bin/pg_basebackup/Makefile|8 +-
 src/bin/pg_basebackup/meson.build |   19 +
 src/bin/pg_basebackup/pg_createsubscriber.c   | 1852 +
 .../t/040_pg_createsubscriber.pl  |   44 +
 .../t/041_pg_createsubscriber_standby.pl  

Re: Should we remove -Wdeclaration-after-statement?

2024-01-29 Thread Euler Taveira
On Mon, Jan 29, 2024, at 12:03 PM, Jelte Fennema-Nio wrote:
> I feel like this is the type of change where there's not much
> discussion to be had. And the only way to resolve it is to use some
> voting to gauge community opinion.
> 
> So my suggestion is for people to respond with -1, -0.5, +-0, +0.5, or
> +1 to indicate support against/for the change.
> 
> I'll start: +1

-1 here. Unless you have a huge amount of variables, (1) is an issue. The most
annoying argument against the current style (declarations at the top of the
function) is (2). However, people usually say it is a code smell to have a big
chunk of code into a single function and that you need to split the code into
multiple functions. Having said that, it is easier to check if the variable V
is used because the number of lines you need to read is small. And it imposes
similar effort to inspect the code than your argument (3), (4), and (5).

One argument against it is if you use the "declare on first use" style, you
will spend more time inspecting the code if you are refactoring it. You
identify the code block you want to move and than starts the saga. Check every
single variable used by this code block and search for its declaration. It is
such a time consuming task. However, if you are using the "declare at the top"
style, it is a matter of checking at the top.

Keep both styles can be rather confusing (in particular for newbies). And as
Nathan said I don't see huge benefits moving from one style to the other.


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: speed up a logical replica setup

2024-01-28 Thread Euler Taveira
On Fri, Jan 26, 2024, at 4:55 AM, Hayato Kuroda (Fujitsu) wrote:
> Again, thanks for updating the patch! There are my random comments for v9.

Thanks for checking v9. I already incorporated some of the points below into
the next patch. Give me a couple of hours to include some important points.

> 01.
> I cannot find your replies for my comments#7 [1] but you reverted related 
> changes.
> I'm not sure you are still considering it or you decided not to include 
> changes.
> Can you clarify your opinion?
> (It is needed because changes are huge so it quite affects other 
> developments...)

It is still on my list. As I said in a previous email I'm having a hard time
reviewing pieces from your 0002 patch because you include a bunch of things
into one patch.

> 02.
> ```
> +   -t  class="parameter">seconds
> +   --timeout= class="parameter">seconds
> ```
> 
> But source codes required `--recovery-timeout`. Please update either of them,

Oops. Fixed. My preference is --recovery-timeout because someone can decide to
include a --timeout option for this tool.

> 03.
> ```
> + *Create a new logical replica from a standby server
> ```
> 
> Junwang pointed out to change here but the change was reverted [2]
> Can you clarify your opinion as well?

I'll review the documentation once I fix the code. Since the tool includes the
*create* verb into its name, it seems strange use another verb (convert) in the
description. Maybe we should just remove the word *new* and a long description
explains the action is to turn the standby (physical replica) into a logical
replica.

> 04.
> ```
> +/*
> + * Is the source server ready for logical replication? If so, create the
> + * publications and replication slots in preparation for logical replication.
> + */
> +static bool
> +setup_publisher(LogicalRepInfo *dbinfo)
> ```
> 
> But this function verifies the source server. I felt they should be in the
> different function.

I split setup_publisher() into check_publisher() and setup_publisher().

> 05.
> ```
> +/*
> + * Is the target server ready for logical replication?
> + */
> +static bool
> +setup_subscriber(LogicalRepInfo *dbinfo)
> 
> 
> Actually, this function does not set up subscriber. It just verifies whether 
> the
> target can become a subscriber, right? If should be renamed.

I renamed setup_subscriber() -> check_subscriber().

> 06.
> ```
> +atexit(cleanup_objects_atexit);
> ```
> 
> The registration of the cleanup function is too early. This sometimes triggers
> a core-dump. E.g., 

I forgot to apply the atexit() patch.

> 07.
> Missing a removal of publications on the standby.

It was on my list to do. It will be in the next patch.

> 08.
> Missing registration of LogicalRepInfo in the typedefs.list.

Done.

> 09
> ```
> + 
> +  
> +   pg_subscriber
> +   option
> +  
> + 
> ```
> 
> Can you reply my comment#2 [4]? I think mandatory options should be written.

I included the mandatory options into the synopsis.

> 10.
> Just to confirm - will you implement start_standby/stop_standby functions in 
> next version?

It is still on my list.


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: speed up a logical replica setup

2024-01-25 Thread Euler Taveira
 a consistent_lsn. So we do not have to keep.
> Also, just before, logical replication slots for each databases are created, 
> so
> WAL records are surely reserved.

This comment needs to be updated. It was written at the time I was pursuing
base backup support too. It doesn't matter if you remove this transient
replication slot earlier because all of the replication slots created to the
subscriptions were created *before* the one for the consistent LSN. Hence, no
additional WAL retention due to this transient replication slot.

> 14.
> 
> ```
> pg_log_info("starting the subscriber");
> start_standby_server(, subport, server_start_log);
> ```
> 
> This info should be in the function.

Ok.

> 15.
> ```
> /*
> * Create a subscription for each database.
> */
> for (i = 0; i < dbarr.ndbs; i++)
> ```
> 
> This can be divided into a function, like create_all_subscriptions().

Ok.

> 16.
> My fault: usage() must be updated.
> 
> 17. use_primary_slot_name
> ```
> if (PQntuples(res) != 1)
> {
> pg_log_error("could not obtain replication slot information: got %d rows, 
> expected %d row",
> PQntuples(res), 1);
> return NULL;
> }
> ```
> 
> Error message should be changed. I think this error means the standby has 
> wrong primary_slot_name, right?

I refactored this code a bit but the message is the same. It detects 2 cases:
(a) you set primary_slot_name but you don't have a replication slot with the
same name and (b) a cannot-happen bug that provides > 1 rows. It is a broken
setup so maybe a hint saying so is enough.

> 18. misc
> Sometimes the pid of pg_subscriber is referred. It can be stored as global 
> variable.

I prefer to keep getpid() call.

> 19.
> C99-style has been allowed, so loop variables like "i" can be declared in the 
> for-statement, like
> 
> ```
> for (int i = 0; i < MAX; i++)
> ```

v9 does it.

> 20.
> Some comments, docs, and outputs must be fixed when the name is changed.

Next patch.


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: speed up a logical replica setup

2024-01-25 Thread Euler Taveira
On Tue, Jan 23, 2024, at 10:29 PM, Euler Taveira wrote:
> I'll post a new one soon.

I'm attaching another patch that fixes some of the issues pointed out by
Hayato, Shlok, and Junwang.

* publication doesn't exist. The analysis [1] was done by Hayato but I didn't
  use the proposed patch. Instead I refactored the code a bit [2] and call it
  from a new function (setup_publisher) that is called before the promotion.
* fix wrong path name in the initial comment [3]
* change terminology: logical replica -> physical replica [3]
* primary / standby is ready for logical replication? setup_publisher() and
  setup_subscriber() check if required GUCs are set accordingly. For primary,
  it checks wal_level = logical, max_replication_slots has remain replication
  slots for the proposed setup and also max_wal_senders available. For standby,
  it checks max_replication_slots for replication origin and also remain number
  of background workers to start the subscriber.
* retain option: I extracted this one from Hayato's patch [4]
* target server must be a standby. It seems we agree that this restriction
  simplifies the code a bit but can be relaxed in the future (if/when base
  backup support is added.)
* recovery timeout option: I decided to include it but I think the use case is
  too narrow. It helps in broken setups. However, it can be an issue in some
  scenarios like time-delayed replica, large replication lag, slow hardware,
  slow network. I didn't use the proposed patch [5]. Instead, I came up with a
  simple one that defaults to forever. The proposed one defaults to 60 seconds
  but I'm afraid that due to one of the scenarios I said in a previous
  sentence, we cancel a legitimate case. Maybe we should add a message during
  dry run saying that due to a replication lag, it will take longer to run.
* refactor primary_slot_name code. With the new setup_publisher and
  setup_subscriber functions, I splitted the function that detects the
  primary_slot_name use into 2 pieces just to avoid extra connections to have
  the job done.
* remove fallback_application_name as suggested by Hayato [5] because logical
  replication already includes one.

I'm still thinking about replacing --subscriber-conninfo with separate items
(username, port, password?, host = socket dir). Maybe it is an overengineering.
The user can always prepare the environment to avoid unwanted and/or external
connections.

I didn't change the name from pg_subscriber to pg_createsubscriber yet but if I
didn't hear objections about it, I'll do it in the next patch.


[1] 
https://www.postgresql.org/message-id/TY3PR01MB9889C5D55206DDD978627D07F5752%40TY3PR01MB9889.jpnprd01.prod.outlook.com
[2] 
https://www.postgresql.org/message-id/73ab86ca-3fd5-49b3-9c80-73d1525202f1%40app.fastmail.com
[3] 
https://www.postgresql.org/message-id/TY3PR01MB9889678E47B918F4D83A6FD8F57B2%40TY3PR01MB9889.jpnprd01.prod.outlook.com
[4] 
https://www.postgresql.org/message-id/TY3PR01MB9889678E47B918F4D83A6FD8F57B2%40TY3PR01MB9889.jpnprd01.prod.outlook.com
[5] 
https://www.postgresql.org/message-id/TY3PR01MB9889593399165B9A04106741F5662%40TY3PR01MB9889.jpnprd01.prod.outlook.com


--
Euler Taveira
EDB   https://www.enterprisedb.com/
From d9ef01a806c3d8697faa444283f19c2deaa58850 Mon Sep 17 00:00:00 2001
From: Euler Taveira 
Date: Mon, 5 Jun 2023 14:39:40 -0400
Subject: [PATCH v9] Creates a new logical replica from a standby server

A new tool called pg_subscriber can convert a physical replica into a
logical replica. It runs on the target server and should be able to
connect to the source server (publisher) and the target server
(subscriber).

The conversion requires a few steps. Check if the target data directory
has the same system identifier than the source data directory. Stop the
target server if it is running as a standby server. Create one
replication slot per specified database on the source server. One
additional replication slot is created at the end to get the consistent
LSN (This consistent LSN will be used as (a) a stopping point for the
recovery process and (b) a starting point for the subscriptions). Write
recovery parameters into the target data directory and start the target
server (Wait until the target server is promoted). Create one
publication (FOR ALL TABLES) per specified database on the source
server. Create one subscription per specified database on the target
server (Use replication slot and publication created in a previous step.
Don't enable the subscriptions yet). Sets the replication progress to
the consistent LSN that was got in a previous step. Enable the
subscription for each specified database on the target server. Remove
the additional replication slot that was used to get the consistent LSN.
Stop the target server. Change the system identifier from the target
server.

Depending on your workload and database size, creating a logical replica
couldn't be an option due to resource constraints (WAL backlog should be
available until all tabl

Re: speed up a logical replica setup

2024-01-23 Thread Euler Taveira
On Mon, Jan 22, 2024, at 6:30 AM, Shlok Kyal wrote:
> We fixed some of the comments posted in the thread. We have created it
> as top-up patch 0002 and 0003.

Cool.

> 0002 patch contains the following changes:
> * Add a timeout option for the recovery option, per [1]. The code was
> basically ported from pg_ctl.c.
> * Reject if the target server is not a standby, per [2]
> * Raise FATAL error if --subscriber-conninfo specifies non-local server, per 
> [3]
>   (not sure it is really needed, so feel free reject the part.)
> * Add check for max_replication_slots and wal_level; as per [4]
> * Add -u and -p options; as per [5]
> * Addressed comment except 5 and 8 in [6] and comment in [7]

My suggestion is that you create separate patches for each change. It helps
with review and alternative proposals. Some of these items conflict with what I
have in my local branch and removing one of them is time consuming. For this
one, I did the job but let's avoid rework.

> 0003 patch contains fix for bug reported in [8].

LGTM. As I said in the other email, I included it.

I'll post a new one soon.


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: speed up a logical replica setup

2024-01-23 Thread Euler Taveira
On Mon, Jan 22, 2024, at 4:06 AM, Hayato Kuroda (Fujitsu) wrote:
> I analyzed and found a reason. This is because publications are invisible for 
> some transactions.
> 
> As the first place, below operations were executed in this case.
> Tuples were inserted after getting consistent_lsn, but before starting the 
> standby.
> After doing the workload, I confirmed again that the publication was created.
> 
> 1. on primary, logical replication slots were created.
> 2. on primary, another replication slot was created.
> 3. ===on primary, some tuples were inserted. ===
> 4. on standby, a server process was started
> 5. on standby, the process waited until all changes have come.
> 6. on primary, publications were created.
> 7. on standby, subscriptions were created.
> 8. on standby, a replication progress for each subscriptions was set to given 
> LSN (got at step2).
> =pg_subscriber finished here=
> 9. on standby, a server process was started again
> 10. on standby, subscriptions were enabled. They referred slots created at 
> step1.
> 11. on primary, decoding was started but ERROR was raised.

Good catch! It is a design flaw.

> In this case, tuples were inserted *before creating publication*.
> So I thought that the decoded transaction could not see the publication 
> because
> it was committed after insertions.
> 
> One solution is to create a publication before creating a consistent slot.
> Changes which came before creating the slot were surely replicated to the 
> standby,
> so upcoming transactions can see the object. We are planning to patch set to 
> fix
> the issue in this approach.

I'll include a similar code in the next patch and also explain why we should
create the publication earlier. (I'm renaming
create_all_logical_replication_slots to setup_publisher and calling
create_publication from there and also adding the proposed GUC checks in it.)


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: speed up a logical replica setup

2024-01-23 Thread Euler Taveira
On Mon, Jan 22, 2024, at 6:22 AM, Amit Kapila wrote:
> On Mon, Jan 22, 2024 at 2:38 PM Hayato Kuroda (Fujitsu)
>  wrote:
> > >
> > > > Yet other options could be
> > > > pg_buildsubscriber, pg_makesubscriber as 'build' or 'make' in the name
> > > > sounds like we are doing some work to create the subscriber which I
> > > > think is the case here.
> > >
> > > I see your point here.  pg_createsubscriber is not like createuser in
> > > that it just runs an SQL command.  It does something different than
> > > CREATE SUBSCRIBER.
> 
> Right.

Subscriber has a different meaning of subscription. Subscription is an SQL
object. Subscriber is the server (node in replication terminology) where the
subscription resides. Having said that pg_createsubscriber doesn't seem a bad
name because you are creating a new subscriber. (Indeed, you are transforming /
converting but "create" seems closer and users can infer that it is a tool to
build a new logical replica.

>   So a different verb would make that clearer.  Maybe
> > > something from here: https://www.thesaurus.com/browse/convert
> >
> > I read the link and found a good verb "switch". So, how about using 
> > "pg_switchsubscriber"?
> >
> 
> I also initially thought on these lines and came up with a name like
> pg_convertsubscriber but didn't feel strongly about it as that would
> have sounded meaningful if we use a name like
> pg_convertstandbytosubscriber. Now, that has become too long. Having
> said that, I am not opposed to it having a name on those lines. BTW,
> another option that occurred to me today is pg_preparesubscriber. We
> internally create slots and then wait for wal, etc. which makes me
> sound like adding 'prepare' in the name can also explain the purpose.

I think "convert" and "transform" fit for this case. However, "create",
"convert" and "transform" have 6, 7 and 9 characters,  respectively. I suggest
that we avoid long names (subscriber already has 10 characters). My preference
is pg_createsubscriber.


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: speed up a logical replica setup

2024-01-11 Thread Euler Taveira
On Thu, Jan 11, 2024, at 9:18 AM, Hayato Kuroda (Fujitsu) wrote:
> 
> I have been concerned that the patch has not been tested by cfbot due to the
> application error. Also, some comments were raised. Therefore, I created a 
> patch
> to move forward.
> I also tried to address some comments which is not so claimed by others.
> They were included in 0003 patch.

[I removed the following part in the previous email and couldn't reply to it...]

> * 0001 patch
> It is almost the same as v3-0001, which was posted by Euler.
> An unnecessary change for Mkvcbuild.pm (this file was removed) was ignored.

v5 removes the MSVC support.

> * 0002 patch
> This contains small fixes to keep complier quiet.

I applied it. Although, I used a different approach for format specifier.

> * 0003 patch
> This addresses comments posted to -hackers. For now, this does not contain a 
> doc.
> Will add if everyone agrees these idea.

I didn't review all items but ...

> 1.
> An option --port was added to control the port number for physical standby.
> Users can specify a port number via the option, or an environment variable 
> PGSUBPORT.
> If not specified, a fixed value (50111) would be used.

My first reaction as a new user would be: why do I need to specify a port if my
--subscriber-conninfo already contains a port? Ugh. I'm wondering if we can do
it behind the scenes. Try a range of ports.

> 2.
> A FATAL error would be raised if --subscriber-conninfo specifies non-local 
> server.

Extra protection is always good. However, let's make sure this code path is
really useful. I'll think a bit about it.

> 3. 
> Options -o/-O were added to specify options for publications/subscriptions.

Flexibility is cool. However, I think the cost benefit of it is not good. You
have to parse the options to catch preliminary errors. Things like publish only
delete and subscription options that conflicts with the embedded ones are
additional sources of failure.

> 4. 
> Made standby to save their output to log file.

It was already done in v5. I did in a different way.

> 5. 
> Unnecessary Assert in drop_replication_slot() was removed.

Instead, I fixed the code and keep the assert.


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: speed up a logical replica setup

2024-01-11 Thread Euler Taveira
On Thu, Jan 11, 2024, at 9:18 AM, Hayato Kuroda (Fujitsu) wrote:
> I have been concerned that the patch has not been tested by cfbot due to the
> application error. Also, some comments were raised. Therefore, I created a 
> patch
> to move forward.

Let me send an updated patch to hopefully keep the CF bot happy. The following
items are included in this patch:

* drop physical replication slot if standby is using one [1].
* cleanup small changes (copyright, .gitignore) [2][3]
* fix getopt_long() options [2]
* fix format specifier for some messages
* move doc to Server Application section [4]
* fix assert failure
* ignore duplicate database names [2]
* store subscriber server log into a separate file
* remove MSVC support

I'm still addressing other reviews and I'll post another version that includes
it soon.

[1] 
https://www.postgresql.org/message-id/e02a2c17-22e5-4ba6-b788-de696ab74f1e%40app.fastmail.com
[2] 
https://www.postgresql.org/message-id/CALDaNm1joke42n68LdegN5wCpaeoOMex2EHcdZrVZnGD3UhfNQ%40mail.gmail.com
[3] 
https://www.postgresql.org/message-id/TY3PR01MB98895BA6C1D72CB8582CACC4F5682%40TY3PR01MB9889.jpnprd01.prod.outlook.com
[4] 
https://www.postgresql.org/message-id/TY3PR01MB988978C7362A101927070D29F56A2%40TY3PR01MB9889.jpnprd01.prod.outlook.com


--
Euler Taveira
EDB   https://www.enterprisedb.com/
From 2341ef3dc26d48b51b13ad01ac38c79d24b1e461 Mon Sep 17 00:00:00 2001
From: Euler Taveira 
Date: Mon, 5 Jun 2023 14:39:40 -0400
Subject: [PATCH v5] Creates a new logical replica from a standby server

A new tool called pg_subscriber can convert a physical replica into a
logical replica. It runs on the target server and should be able to
connect to the source server (publisher) and the target server
(subscriber).

The conversion requires a few steps. Check if the target data directory
has the same system identifier than the source data directory. Stop the
target server if it is running as a standby server. Create one
replication slot per specified database on the source server. One
additional replication slot is created at the end to get the consistent
LSN (This consistent LSN will be used as (a) a stopping point for the
recovery process and (b) a starting point for the subscriptions). Write
recovery parameters into the target data directory and start the target
server (Wait until the target server is promoted). Create one
publication (FOR ALL TABLES) per specified database on the source
server. Create one subscription per specified database on the target
server (Use replication slot and publication created in a previous step.
Don't enable the subscriptions yet). Sets the replication progress to
the consistent LSN that was got in a previous step. Enable the
subscription for each specified database on the target server. Remove
the additional replication slot that was used to get the consistent LSN.
Stop the target server. Change the system identifier from the target
server.

Depending on your workload and database size, creating a logical replica
couldn't be an option due to resource constraints (WAL backlog should be
available until all table data is synchronized). The initial data copy
and the replication progress tends to be faster on a physical replica.
The purpose of this tool is to speed up a logical replica setup.
---
 doc/src/sgml/ref/allfiles.sgml|1 +
 doc/src/sgml/ref/pg_subscriber.sgml   |  284 +++
 doc/src/sgml/reference.sgml   |1 +
 src/bin/pg_basebackup/.gitignore  |1 +
 src/bin/pg_basebackup/Makefile|8 +-
 src/bin/pg_basebackup/meson.build |   19 +
 src/bin/pg_basebackup/pg_subscriber.c | 1657 +
 src/bin/pg_basebackup/t/040_pg_subscriber.pl  |   44 +
 .../t/041_pg_subscriber_standby.pl|  139 ++
 9 files changed, 2153 insertions(+), 1 deletion(-)
 create mode 100644 doc/src/sgml/ref/pg_subscriber.sgml
 create mode 100644 src/bin/pg_basebackup/pg_subscriber.c
 create mode 100644 src/bin/pg_basebackup/t/040_pg_subscriber.pl
 create mode 100644 src/bin/pg_basebackup/t/041_pg_subscriber_standby.pl

diff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml
index 4a42999b18..3862c976d7 100644
--- a/doc/src/sgml/ref/allfiles.sgml
+++ b/doc/src/sgml/ref/allfiles.sgml
@@ -214,6 +214,7 @@ Complete list of usable sgml source files in this directory.
 
 
 
+
 
 
 
diff --git a/doc/src/sgml/ref/pg_subscriber.sgml b/doc/src/sgml/ref/pg_subscriber.sgml
new file mode 100644
index 00..553185c35f
--- /dev/null
+++ b/doc/src/sgml/ref/pg_subscriber.sgml
@@ -0,0 +1,284 @@
+
+
+
+ 
+  pg_subscriber
+ 
+
+ 
+  pg_subscriber
+  1
+  Application
+ 
+
+ 
+  pg_subscriber
+  create a new logical replica from a standby server
+ 
+
+ 
+  
+   pg_subscriber
+   option
+  
+ 
+
+ 
+  Description
+  
+   pg_subscriber takes the publisher and subscriber
+   connection strings, a cluster directory from a standby server and a list of
+   da

Re: speed up a logical replica setup

2024-01-10 Thread Euler Taveira
On Wed, Jan 10, 2024, at 1:33 AM, Shlok Kyal wrote:
> Here the standby node would be waiting for the 'consistent_lsn' wal
> during recovery but this wal will not be present on standby if no
> physical replication is setup. Hence the command will be waiting
> infinitely for the wal.

Hmm. Some validations are missing.

> To solve this added a timeout of 60s for the recovery process and also
> added a check so that pg_subscriber would give a error when it called
> for node which is not in physical replication.
> Have attached the patch for the same. It is a top-up patch of the
> patch shared by Euler at [1].

If the user has a node that is not a standby and it does not set the GUCs to
start the recovery process from a backup, the initial setup is broken. (That's
the case you described.) A good UI is to detect this scenario earlier.
Unfortunately, there isn't a reliable and cheap way to do it. You need to start
the recovery and check if it is having some progress. (I don't have a strong
opinion about requiring a standby to use this tool. It would reduce the
complexity about checking if the setup has all requirements to run this tool.)


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: speed up a logical replica setup

2024-01-04 Thread Euler Taveira
On Thu, Jan 4, 2024, at 3:05 AM, Amit Kapila wrote:
> Won't it be a better user experience that after setting up the target
> server as a logical replica (subscriber), it started to work
> seamlessly without user intervention?

If we have an option to control the replication slot removal (default is on),
it seems a good UI. Even if the user decides to disable the replication slot
removal, it should print a message saying that these replication slots can
cause WAL retention.

> > The initial version had an option to stop the subscriber. I decided to
> > remove the option and stop the subscriber by default mainly because (1) it 
> > is
> > an extra step to start the server (another point is that the WAL retention
> > doesn't happen due to additional (synchronized?) replication slots on
> > subscriber -- point 2). It was a conservative choice. If point 2 isn't an
> > issue, imo point 1 is no big deal.
> >
> 
> By point 2, do you mean to have a check for "max replication slots"?
> It so, the one possibility is to even increase that config, if the
> required max_replication_slots is low.

By point 2, I mean WAL retention (sentence inside parenthesis).


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: speed up a logical replica setup

2024-01-04 Thread Euler Taveira
On Thu, Jan 4, 2024, at 2:41 AM, Amit Kapila wrote:
> So, you also seem to be saying that it is not required once
> pg_subscriber has promoted it. So, why it should be optional to remove
> physical_replication_slot? I think we must remove it from the primary
> unless there is some other reason.

My point is to *always* remove the primary_slot_name on primary.

> My point is to not have an additional slot and keep a comment
> indicating that we need an extra slot once we add pg_basebackup
> support.

Got it.

> > 3. How about sync slots on the physical standby if present? Do we want
> > to retain those as it is or do we need to remove those? We are
> > actively working on the patch [1] for the same.
> >
> >
> > I didn't read the current version of the referred patch but if the proposal 
> > is
> > to synchronize logical replication slots iif you are using a physical
> > replication, as soon as pg_subscriber finishes the execution, there won't be
> > synchronization on these logical replication slots because there isn't a
> > physical replication anymore. If the goal is a promotion, the current 
> > behavior
> > is correct because the logical replica will retain WAL since it was 
> > converted.
> >
> 
> I don't understand what you mean by promotion in this context. If
> users want to simply promote the standby then there is no need to do
> additional things that this tool is doing.

ENOCOFFEE. s/promotion/switchover/

> > However, if you are creating a logical replica, this WAL retention is not 
> > good
> > and the customer should eventually remove these logical replication slots on
> > the logical replica.
> >
> 
> I think asking users to manually remove such slots won't be a good
> idea. We might want to either remove them by default or provide an
> option to the user.

Am I correct that the majority of the use cases these replication slots will be
useless? If so, let's remove them by default and add an option to control this
behavior (replication slot removal).


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: speed up a logical replica setup

2024-01-03 Thread Euler Taveira
On Mon, Jan 1, 2024, at 7:14 AM, vignesh C wrote:
> 1) This Assert can fail if source is shutdown:
> +static void
> +drop_replication_slot(PGconn *conn, LogicalRepInfo *dbinfo, const
> char *slot_name)
> +{
> +   PQExpBuffer str = createPQExpBuffer();
> +   PGresult   *res;
> +
> +   Assert(conn != NULL);

Oops. I'll remove it.

> 2) Should we have some checks to see if the max replication slot
> configuration is ok based on the number of slots that will be created,
> we have similar checks in upgrade replication slots in
> check_new_cluster_logical_replication_slots

That's a good idea.

> 3) Should we check if wal_level is set to logical, we have similar
> checks in upgrade replication slots in
> check_new_cluster_logical_replication_slots

That's a good idea.

> 4) The physical replication slot that was created will still be
> present in the primary node, I felt this should be removed.

My proposal is to remove it [1]. It'll be include in the next version.

> 5) I felt the target server should be started before completion of
> pg_subscriber:

Why? The initial version had an option to stop the subscriber. I decided to
remove the option and stop the subscriber by default mainly because (1) it is
an extra step to start the server (another point is that the WAL retention
doesn't happen due to additional (synchronized?) replication slots on
subscriber -- point 2). It was a conservative choice. If point 2 isn't an
issue, imo point 1 is no big deal.


[1] 
https://www.postgresql.org/message-id/e02a2c17-22e5-4ba6-b788-de696ab74f1e%40app.fastmail.com


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: speed up a logical replica setup

2024-01-03 Thread Euler Taveira
On Thu, Dec 21, 2023, at 3:16 AM, Amit Kapila wrote:
> I think this is an important part. Shall we try to write to some file
> the pending objects to be cleaned up? We do something like that during
> the upgrade.

That's a good idea.

> > 2. remove the physical replication slot if the standby is using one
> >(primary_slot_name).
> > 3. provide instructions to promote the logical replica into primary, I mean,
> >stop the replication between the nodes and remove the replication setup
> >(publications, subscriptions, replication slots). Or even include another
> >action to do it. We could add both too.
> >
> > Point 1 should be done. Points 2 and 3 aren't essential but will provide a 
> > nice
> > UI for users that would like to use it.
> >
> 
> Isn't point 2 also essential because how would otherwise such a slot
> be advanced or removed?

I'm worried about a scenario that you will still use the primary. (Let's say
the logical replica will be promoted to a staging or dev server.) No connection
between primary and this new server so the primary slot is useless after the
promotion.

> A few other points:
> ==
> 1. Previously, I asked whether we need an additional replication slot
> patch created to get consistent LSN and I see the following comment in
> the patch:
> 
> + *
> + * XXX we should probably use the last created replication slot to get a
> + * consistent LSN but it should be changed after adding pg_basebackup
> + * support.
> 
> Yeah, sure, we may want to do that after backup support and we can
> keep a comment for the same but I feel as the patch stands today,
> there is no good reason to keep it.

I'll remove the comment to avoid confusing.

> Also, is there a reason that we
> can't create the slots after backup is complete and before we write
> recovery parameters

No.

> 2.
> + appendPQExpBuffer(str,
> +   "CREATE SUBSCRIPTION %s CONNECTION '%s' PUBLICATION %s "
> +   "WITH (create_slot = false, copy_data = false, enabled = false)",
> +   dbinfo->subname, dbinfo->pubconninfo, dbinfo->pubname);
> 
> Shouldn't we enable two_phase by default for newly created
> subscriptions? Is there a reason for not doing so?

Why? I decided to keep the default for some settings (streaming,
synchronous_commit, two_phase, disable_on_error). Unless there is a compelling
reason to enable it, I think we should use the default. Either way, data will
arrive on subscriber as soon as the prepared transaction is committed.

> 3. How about sync slots on the physical standby if present? Do we want
> to retain those as it is or do we need to remove those? We are
> actively working on the patch [1] for the same.

I didn't read the current version of the referred patch but if the proposal is
to synchronize logical replication slots iif you are using a physical
replication, as soon as pg_subscriber finishes the execution, there won't be
synchronization on these logical replication slots because there isn't a
physical replication anymore. If the goal is a promotion, the current behavior
is correct because the logical replica will retain WAL since it was converted.
However, if you are creating a logical replica, this WAL retention is not good
and the customer should eventually remove these logical replication slots on
the logical replica.

> 4. Can we see some numbers with various sizes of databases (cluster)
> to see how it impacts the time for small to large-size databases as
> compared to the traditional method? This might help us with giving
> users advice on when to use this tool. We can do this bit later as
> well when the patch is closer to being ready for commit.

I'll share it.


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: speed up a logical replica setup

2023-12-20 Thread Euler Taveira
On Wed, Dec 20, 2023, at 9:22 AM, Shlok Kyal wrote:
> When I built the code using Visual Studio, on installing postgres,
> pg_subscriber binary was not created.
> But when I built the code using Meson, on installing postgres,
> pg_subscriber binary was created.
> Is this behaviour intentional?

No. I will update the patch accordingly. I suspect that a fair amount of patches
broke due to MSVC change.


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: Add a perl function in Cluster.pm to generate WAL

2023-12-19 Thread Euler Taveira
On Tue, Dec 19, 2023, at 8:00 PM, Michael Paquier wrote:
> On Tue, Dec 19, 2023 at 11:25:50AM +0530, Bharath Rupireddy wrote:
> > I used pg_logical_emit_message() in non-transactional mode without
> > needing an explicit WAL flush as the pg_switch_wal() does a WAL flush
> > at the end [1].
> 
> Indeed, that should be enough to answer my comment.
> 
> > Attached v4 patch.
> 
> LGTM, thanks.  Euler, what do you think?
> 

LGTM.


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: Add --check option to pgindent

2023-12-18 Thread Euler Taveira
On Mon, Dec 18, 2023, at 9:41 AM, Daniel Gustafsson wrote:
> > On 15 Dec 2023, at 16:43, Tristan Partin  wrote:
> 
> > Here is a v3.
> 
> I think this is pretty much ready to go, the attached v4 squashes the changes
> and fixes the man-page which also needed an update.  The referenced Wiki page
> will need an edit or two after this goes in, but that's easy enough.

... and pgbuildfarm client [1] needs to be changed too.


[1] 
https://github.com/PGBuildFarm/client-code/blob/main/PGBuild/Modules/CheckIndent.pm#L55-L90


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: [DOC] Introducing Quick Start Guide to PL/pgSQL and PL/Python Documentation

2023-12-18 Thread Euler Taveira
On Sat, Dec 16, 2023, at 7:49 AM, Ishaan Adarsh wrote:
> I have made some documentation enhancements for PL/pgSQL and PL/Python 
> sections. The changes include the addition of a Quick Start Guide to 
> facilitate a smoother onboarding experience for users.

Great! Add your patch to the next CF [1] so we don't miss it.

[1] https://commitfest.postgresql.org/46/


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: Add a perl function in Cluster.pm to generate WAL

2023-12-18 Thread Euler Taveira
10178D8, desc: DELETE xmax: 295182, off: 24, infobits: 
[KEYS_UPDATED], flags: 0x00, blkref #0: rel 1663/33287/2608 blk 3
rmgr: Transaction len (rec/tot):321/   321, tx: 295182, lsn: 
0/41017950, prev 0/41017918, desc: INVALIDATION ; inval msgs: catcache 7 
catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 
catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 55 
catcache 54 relcache 88102 snapshot 2608
rmgr: Transaction len (rec/tot):469/   469, tx: 295182, lsn: 
0/41017A98, prev 0/41017950, desc: COMMIT 2023-12-18 08:35:25.053905 -03; rels: 
base/33287/88102; dropped stats: 2/33287/88102; inval msgs: catcache 80 
catcache 79 catcache 80 catcache 79 catcache 7 catcache 6 catcache 7 catcache 6 
catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 
catcache 6 catcache 7 catcache 6 catcache 55 catcache 54 snapshot 2608 snapshot 
2608 relcache 88102 snapshot 2608
rmgr: XLOGlen (rec/tot): 24/24, tx:  0, lsn: 
0/41017C70, prev 0/41017A98, desc: SWITCH

The difference is

euler=# select '0/40A8'::pg_lsn - '0/4028'::pg_lsn;
?column? 
--
  128
(1 row)

euler=# select '0/41017A98'::pg_lsn - '0/4128'::pg_lsn;
?column? 
--
96880
(1 row)


It is cheaper.


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: Improve eviction algorithm in ReorderBuffer

2023-12-15 Thread Euler Taveira
On Fri, Dec 15, 2023, at 9:11 AM, Masahiko Sawada wrote:
> 
> I assume you mean to add ReorderBufferTXN entries to the binaryheap
> and then build it by comparing their sizes (i.e. txn->size). But
> ReorderBufferTXN entries are removed and deallocated once the
> transaction finished. How can we efficiently remove these entries from
> binaryheap? I guess it would be O(n) to find the entry among the
> unordered entries, where n is the number of transactions in the
> reorderbuffer.

O(log n) for both functions: binaryheap_remove_first() and
binaryheap_remove_node(). I didn't read your patch but do you really need to
free entries one by one? If not, binaryheap_free().


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: logical decoding and replication of sequences, take 2

2023-12-14 Thread Euler Taveira
On Thu, Dec 14, 2023, at 12:44 PM, Ashutosh Bapat wrote:
> I haven't found the code path from where the sequence cleanup gets
> called. But it's being called. Am I missing something?

ReorderBufferCleanupTXN.


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: Move walreceiver state assignment (to WALRCV_STREAMING) in WalReceiverMain()

2023-12-12 Thread Euler Taveira
On Tue, Dec 12, 2023, at 12:58 PM, Drouvot, Bertrand wrote:
> Currently walrcv->walRcvState is set to WALRCV_STREAMING at the
> beginning of WalReceiverMain().
> 
> But it seems that after this assignment things could be wrong before the
> walreicever actually starts streaming (like not being able to connect
> to the primary).
> 
> It looks to me that WALRCV_STREAMING should be set once 
> walrcv_startstreaming()
> returns true: this is the proposal of this patch.

Per the state name (streaming), it seems it should be set later as you
proposed, however, I'm concerned about the previous state (WALRCV_STARTING). If
I'm reading the code correctly, WALRCV_STARTING is assigned at
RequestXLogStreaming():

SetInstallXLogFileSegmentActive();
RequestXLogStreaming(tli, ptr, PrimaryConnInfo,
 PrimarySlotName,
 wal_receiver_create_temp_slot);
flushedUpto = 0; 
}

/*
 * Check if WAL receiver is active or wait to start up.
 */
if (!WalRcvStreaming())
{
lastSourceFailed = true;
break;
}

After a few lines the function WalRcvStreaming() has:

SpinLockRelease(>mutex);

/*  
 * If it has taken too long for walreceiver to start up, give up. Setting
 * the state to STOPPED ensures that if walreceiver later does start up
 * after all, it will see that it's not supposed to be running and die
 * without doing anything.
 */
if (state == WALRCV_STARTING)
{   
pg_time_t   now = (pg_time_t) time(NULL);

if ((now - startTime) > WALRCV_STARTUP_TIMEOUT)
{   
boolstopped = false;

SpinLockAcquire(>mutex);
if (walrcv->walRcvState == WALRCV_STARTING)
{   
state = walrcv->walRcvState = WALRCV_STOPPED;
stopped = true;
}
SpinLockRelease(>mutex);

if (stopped)
ConditionVariableBroadcast(>walRcvStoppedCV);
}
}   

Couldn't it give up before starting if you apply your patch? My main concern is
due to a slow system, the walrcv_connect() took to long in WalReceiverMain()
and the code above kills the walreceiver while in the process to start it.
Since you cannot control the hardcoded WALRCV_STARTUP_TIMEOUT value, you might
have issues during overload periods.


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: Add --check option to pgindent

2023-12-12 Thread Euler Taveira
On Tue, Dec 12, 2023, at 7:28 AM, Michael Banck wrote:
> On Tue, Dec 12, 2023 at 11:18:59AM +0100, Daniel Gustafsson wrote:
> > > On 12 Dec 2023, at 01:09, Tristan Partin  wrote:
> > > 
> > > Not sold on the name, but --check is a combination of --silent-diff
> > > and --show-diff. I envision --check mostly being used in CI
> > > environments. I recently came across a situation where this behavior
> > > would have been useful. Without --check, you're left to capture the
> > > output of --show-diff and exit 2 if the output isn't empty by
> > > yourself.
> > 
> > I wonder if we should model this around the semantics of git diff to
> > keep it similar to other CI jobs which often use git diff?  git diff
> > --check means "are there conflicts or issues" which isn't really
> > comparable to here, git diff --exit-code however is pretty much
> > exactly what this is trying to accomplish.
> > 
> > That would make pgindent --show-diff --exit-code exit with 1 if there
> > were diffs and 0 if there are no diffs.
> 
> To be honest, I find that rather convoluted; contrary to "git diff", I
> believe the primary action of pgident is not to show diffs, so I find
> the proposed --check option to be entirely reasonable from a UX
> perspective.
> 
> On the other hand, tying a "does this need re-indenting?" question to a
> "--show-diff --exit-code" option combination is not very obvious (to me,
> at least).

Multiple options to accomplish a use case might not be obvious. I'm wondering
if we can combine it into a unique option.

--show-diff show the changes that would be made
--silent-diff   exit with status 2 if any changes would be made
+ --check combination of --show-diff and --silent-diff

I mean

--diff=show,silent,check

When you add exceptions, it starts to complicate the UI.

usage("Cannot have both --silent-diff and --show-diff")
   if $silent_diff && $show_diff;
 
+usage("Cannot have both --check and --show-diff")
+  if $check && $show_diff;
+
+usage("Cannot have both --check and --silent-diff")
+  if $check && $silent_diff;
+


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: I’ve come across what I think is a bug

2023-12-07 Thread Euler Taveira
On Thu, Dec 7, 2023, at 11:39 AM, John Scalia wrote:
> In the documentation, Under CREATE PUBLICATION under parameters 
> 
>  publish (string)
>  This parameter determines which DML operations will be 
> published by the new publication to the subscribers. The value is 
> comma-separated list of operations. The default is to publish all actions, 
> and so the default value for this option is ‘insert, update, delete, 
> truncate’.
> 
> From what I’ve seen, truncate is not set to published by default. I’m looking 
> at a server now with 4 publications on it, and none has truncate set to true. 
> One of these I created, and I know I didn’t set any values. All the other 
> values are set, but not truncate.

What's your Postgres version? The truncate option was introduced in v11. You
didn't provide an evidence that's a bug. Since v11 we have the same behavior:

postgres=# create publication pub1;
CREATE PUBLICATION
postgres=# \x
Expanded display is on.
postgres=# select * from pg_publication;
-[ RECORD 1 ]+-
pubname  | pub1
pubowner | 10
puballtables | f
pubinsert| t
pubupdate| t
pubdelete| t
pubtruncate  | t

postgres=# select version();
-[ RECORD 1 
]---
version | PostgreSQL 11.21 on x86_64-pc-linux-gnu, compiled by gcc (Debian 
10.2.1-6) 10.2.1 20210110, 64-bit

Maybe you are using a client that is *not* providing truncate as an operation.


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: Emitting JSON to file using COPY TO

2023-12-06 Thread Euler Taveira
On Wed, Dec 6, 2023, at 3:59 PM, Daniel Verite wrote:
> The first Copy data message with contents "5b0a" does not qualify
> as a row of data with 3 columns as advertised in the CopyOut
> message. Isn't that a problem?
> 
> At least the json non-ARRAY case ("json lines") doesn't have
> this issue, since every CopyData message corresponds effectively
> to a row in the table.

Moreover, if your interface wants to process the COPY data stream while
receiving it, you cannot provide "json array" format because each row (plus all
of the received ones) is not a valid JSON. Hence, a JSON parser cannot be
executed until you receive the whole data set. (wal2json format 1 has this
disadvantage. Format 2 was born to provide a better alternative -- each row is
a valid JSON.) I'm not saying that "json array" is not useful but that for
large data sets, it is less useful.


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: Is WAL_DEBUG related code still relevant today?

2023-12-06 Thread Euler Taveira
On Wed, Dec 6, 2023, at 9:51 PM, Michael Paquier wrote:
> PerformWalRecovery() with its log for RM_XACT_ID is something that
> stresses me a bit though because this is in the main redo loop which
> is never free.  The same can be said about GenericXLogFinish() because
> the extra computation happens while holding a buffer and marking it
> dirty.  The ones in xlog.c are free of charge as they are called
> outside any critical portions.
> 
> This makes me wonder how much we need to care about
> trace_recovery_messages, actually, and I've never used it.

IIUC trace_recovery_messages was a debugging aid in the 9.0 era when the HS was
introduced. I'm also wondering if anyone used it in the past years.

elog.c:

* Intention is to keep this for at least the whole of the 9.0 production
* release, so we can more easily diagnose production problems in the field.
* It should go away eventually, though, because it's an ugly and
* hard-to-explain kluge.
*/
int
trace_recovery(int trace_level)
{
if (trace_level < LOG &&
trace_level >= trace_recovery_messages)
return LOG; 

    return trace_level;
}


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: Is WAL_DEBUG related code still relevant today?

2023-12-06 Thread Euler Taveira
On Wed, Dec 6, 2023, at 8:27 AM, Peter Eisentraut wrote:
> On 02.12.23 15:06, Bharath Rupireddy wrote:
> > I enabled this code by compiling with the WAL_DEBUG macro and setting
> > wal_debug GUC to on. Firstly, the compilation on Windows failed
> > because XL_ROUTINE was passed inappropriately for XLogReaderAllocate()
> > used.
> 
> This kind of thing could be mostly avoided if we didn't hide all the 
> WAL_DEBUG behind #ifdefs.

AFAICS LOCK_DEBUG also hides its GUCs behind #ifdefs. The fact that XLOG_DEBUG
is a variable but seems like a constant surprises me. I would rename it to
XLogDebug or xlog_debug.

> in the normal case.  That way, we don't need to wrap that in #ifdef 
> WAL_DEBUG, and the compiler can see the disabled code and make sure it 
> continues to build.

I didn't check the LOCK_DEBUG code path to make sure it fits in the same
category as WAL_DEBUG. If it does, maybe it is worth to apply the same logic
there.


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: speed up a logical replica setup

2023-12-05 Thread Euler Taveira
On Thu, Nov 9, 2023, at 8:12 PM, Michael Paquier wrote:
> On Thu, Nov 09, 2023 at 03:41:53PM +0100, Peter Eisentraut wrote:
> > On 08.11.23 00:12, Michael Paquier wrote:
> >> - Should the subdirectory pg_basebackup be renamed into something more
> >> generic at this point?  All these things are frontend tools that deal
> >> in some way with the replication protocol to do their work.  Say
> >> a replication_tools?
> > 
> > Seems like unnecessary churn.  Nobody has complained about any of the other
> > tools in there.
> 
> Not sure.  We rename things across releases in the tree from time to
> time, and here that's straight-forward.

Based on this discussion it seems we have a consensus that this tool should be
in the pg_basebackup directory. (If/when we agree with the directory renaming,
it could be done in a separate patch.) Besides this move, the v3 provides a dry
run mode. It basically executes every routine but skip when should do
modifications. It is an useful option to check if you will be able to run it
without having issues with connectivity, permission, and existing objects
(replication slots, publications, subscriptions). Tests were slightly improved.
Messages were changed to *not* provide INFO messages by default and --verbose
provides INFO messages and --verbose --verbose also provides DEBUG messages. I
also refactored the connect_database() function into which the connection will
always use the logical replication mode. A bug was fixed in the transient
replication slot name. Ashutosh review [1] was included. The code was also 
indented.

There are a few suggestions from Ashutosh [2] that I will reply in another
email.

I'm still planning to work on the following points:

1. improve the cleanup routine to point out leftover objects if there is any
   connection issue.
2. remove the physical replication slot if the standby is using one
   (primary_slot_name).
3. provide instructions to promote the logical replica into primary, I mean,
   stop the replication between the nodes and remove the replication setup
   (publications, subscriptions, replication slots). Or even include another
   action to do it. We could add both too.

Point 1 should be done. Points 2 and 3 aren't essential but will provide a nice
UI for users that would like to use it.


[1] 
https://postgr.es/m/CAExHW5sCAU3NvPKd7msScQKvrBN-x_AdDQD-ZYAwOxuWG%3Doz1w%40mail.gmail.com
[2] 
https://postgr.es/m/caexhw5vhfemfvtuhe+7xwphvzjxrexz5h3dd4uqi7cwmdmj...@mail.gmail.com


--
Euler Taveira
EDB   https://www.enterprisedb.com/
From 003255b64910ce73f15931a43def25c37be96b81 Mon Sep 17 00:00:00 2001
From: Euler Taveira 
Date: Mon, 5 Jun 2023 14:39:40 -0400
Subject: [PATCH v3] Creates a new logical replica from a standby server

A new tool called pg_subscriber can convert a physical replica into a
logical replica. It runs on the target server and should be able to
connect to the source server (publisher) and the target server
(subscriber).

The conversion requires a few steps. Check if the target data directory
has the same system identifier than the source data directory. Stop the
target server if it is running as a standby server. Create one
replication slot per specified database on the source server. One
additional replication slot is created at the end to get the consistent
LSN (This consistent LSN will be used as (a) a stopping point for the
recovery process and (b) a starting point for the subscriptions). Write
recovery parameters into the target data directory and start the target
server (Wait until the target server is promoted). Create one
publication (FOR ALL TABLES) per specified database on the source
server. Create one subscription per specified database on the target
server (Use replication slot and publication created in a previous step.
Don't enable the subscriptions yet). Sets the replication progress to
the consistent LSN that was got in a previous step. Enable the
subscription for each specified database on the target server. Remove
the additional replication slot that was used to get the consistent LSN.
Stop the target server. Change the system identifier from the target
server.

Depending on your workload and database size, creating a logical replica
couldn't be an option due to resource constraints (WAL backlog should be
available until all table data is synchronized). The initial data copy
and the replication progress tends to be faster on a physical replica.
The purpose of this tool is to speed up a logical replica setup.
---
 doc/src/sgml/ref/allfiles.sgml|1 +
 doc/src/sgml/ref/pg_subscriber.sgml   |  284 +++
 doc/src/sgml/reference.sgml   |1 +
 src/bin/pg_basebackup/Makefile|8 +-
 src/bin/pg_basebackup/meson.build |   19 +
 src/bin/pg_basebackup/pg_subscriber.c | 1517 +
 src/bin/pg_basebackup/t/040_pg_subscriber.pl  |   44 +
 .../t/041_pg

Re: Lifetime of commit timestamps

2023-11-17 Thread Euler Taveira
On Mon, Nov 13, 2023, at 9:47 PM, Bruce Momjian wrote:
> Is this documentation change still relevant?

I think so. AFAICS nothing changed. Unless you read the source code, it is not
clear that VACUUM removes the information for frozen tuples. They are decoupled
(but executed in the same routine for convenience), hence, someone can ask why
the pg_xact_commit_timestamp() returns NULL for a transaction that was executed
*after* you enable track_commit_timestamp. The answer is the design used a
existing mechanism to clean up data in order to avoid creating a new one.


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: speed up a logical replica setup

2023-11-08 Thread Euler Taveira
On Tue, Nov 7, 2023, at 8:12 PM, Michael Paquier wrote:
> On Tue, Nov 07, 2023 at 10:00:39PM +0100, Peter Eisentraut wrote:
> > Speaking of which, would it make sense to put this tool (whatever the name)
> > into the pg_basebackup directory?  It's sort of related, and it also shares
> > some code.

I used the CreateReplicationSlot() from streamutil.h but decided to use the
CREATE_REPLICATION_SLOT command directly because it needs the LSN as output. As
you noticed at that time I wouldn't like a dependency in the pg_basebackup
header files; if we move this binary to base backup directory, it seems natural
to refactor the referred function and use it.

> I've read the patch, and the additions to streamutil.h and
> streamutil.c make it kind of natural to have it sit in pg_basebackup/.
> There's pg_recvlogical already there.  I am wondering about two
> things, though:
> - Should the subdirectory pg_basebackup be renamed into something more
> generic at this point?  All these things are frontend tools that deal
> in some way with the replication protocol to do their work.  Say
> a replication_tools?

It is a good fit for this tool since it is another replication tool. I also
agree with the directory renaming; it seems confusing that the directory has
the same name as one binary but also contains other related binaries in it.

> - And if it would be better to refactor some of the code generic to
> all these streaming tools to fe_utils.  What makes streamutil.h a bit
> less pluggable are all its extern variables to control the connection,
> but perhaps that can be an advantage, as well, in some cases.

I like it. There are common functions such as GetConnection(),
CreateReplicationSlot(), DropReplicationSlot() and RunIdentifySystem() that is
used by all of these replication tools. We can move the extern variables into
parameters to have a pluggable streamutil.h.


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: speed up a logical replica setup

2023-11-01 Thread Euler Taveira
On Tue, Oct 31, 2023, at 11:46 PM, shihao zhong wrote:
> I think this is duplicate with https://commitfest.postgresql.org/45/4637/
> 
> The new status of this patch is: Waiting on Author
> 

I withdrew the other entry.


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: Allowing TRUNCATE of FK target when session_replication_role=replica

2023-10-31 Thread Euler Taveira


On Tue, Oct 31, 2023, at 3:21 PM, Hannu Krosing wrote:
> One thing though re:
> > The former is true but the latter is not. Logical replication requires
> > wal_level = logical. That's also true for skipping FSM.
> 
> wal_level=logical is only needed *at provider* side, at least when
> running pglogical.

It is not a requirement for the subscriber. However, it increases the
complexity for a real scenario (in which you set up backup and sometimes
additional physical replicas) because key GUCs require a restart.


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: Allowing TRUNCATE of FK target when session_replication_role=replica

2023-10-31 Thread Euler Taveira
On Tue, Oct 31, 2023, at 5:09 AM, Hannu Krosing wrote:
> Currently we do not allow TRUNCATE of a table when any Foreign Keys
> point to that table.

It is allowed iif you *also* truncate all tables referencing it.

> At the same time we do allow one to delete all rows when
> session_replication_role=replica

That's true.

> This causes all kinds of pain when trying to copy in large amounts of
> data, especially at the start of logical replication set-up, as many
> optimisations to COPY require the table to be TRUNCATEd .
> 
> The main two are ability to FREEZE while copying and the skipping of
> WAL generation in case of wal_level=minimal, both of which can achieve
> significant benefits when data amounts are large.

The former is true but the latter is not. Logical replication requires
wal_level = logical. That's also true for skipping FSM.

> Is there any reason to not allow TRUNCATE when
> session_replication_role=replica ?

That's basically the same proposal as [1]. That patch was rejected because it
was implemented in a different way that doesn't require the
session_replication_role = replica to bypass the FK checks.

That's basically the same proposal as [1]. That patch was rejected because it
was implemented in a different way that doesn't require the
session_replication_role = replica to bypass the FK checks.

There are at least 3 cases that can benefit from this feature:

1) if your scenario includes an additional table only in the subscriber
side that contains a foreign key to a replicated table then you will break your
replication like

ERROR:  cannot truncate a table referenced in a foreign key constraint
DETAIL:  Table "foo" references "bar".
HINT:  Truncate table "foo" at the same time, or use TRUNCATE ... CASCADE.
CONTEXT:  processing remote data for replication origin "pg_16406" during
message type "TRUNCATE" in transaction 12880, finished at 0/297FE08

and you have to manually fix your replication. If we allow
session_replication_role = replica to bypass FK check for TRUNCATE commands, we
wouldn't have an error. I'm not saying that it is a safe operation for logical
replication scenarios. Maybe it is not because table foo will contain invalid
references to table bar and someone should fix it in the subscriber side.
However, the current implementation already allows such orphan rows due to
session_replication_role behavior.

2) truncate table at subscriber side during the initial copy. As you mentioned,
this feature should take advantage of the FREEZE and FSM optimizations. There
was a proposal a few years ago [2].

3) resynchronize a table. Same advantages as item 2.

> Unless there are any serious objections, I will send a patch to also
> allow TRUNCATE in this case.
> 

You should start checking the previous proposal [1].


[1] 
https://www.postgresql.org/message-id/ff835f71-3c6c-335e-4c7b-b9e1646cf3d7%402ndquadrant.it
[2] 
https://www.postgresql.org/message-id/CF3B6672-2A43-4204-A60A-68F359218A9B%40endpoint.com


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: speed up a logical replica setup

2023-10-22 Thread Euler Taveira
On Mon, Feb 21, 2022, at 9:09 AM, Euler Taveira wrote:
> A new tool called pg_subscriber does this conversion and is tightly integrated
> with Postgres.

After a long period of inactivity, I'm back to this client tool. As suggested
by Andres, I added a new helper function to change the system identifier as the
last step. I also thought about including the pg_basebackup support but decided
to keep it simple (at least for this current version). The user can always
execute pg_basebackup as a preliminary step to create a standby replica and it
will work. (I will post a separate patch that includes the pg_basebackup
support on the top of this one.)

Amit asked if an extra replication slot is required. It is not. The reason I
keep it is to remember that at least the latest replication slot needs to be
created after the pg_basebackup finishes (pg_backup_stop() call). Regarding the
atexit() routine, it tries to do the best to remove all the objects it created,
however, there is no guarantee it can remove them because it depends on
external resources such as connectivity and authorization. I added a new
warning message if it cannot drop the transient replication slot. It is
probably a good idea to add such warning message into the cleanup routine too.
More to this point, another feature that checks and remove all left objects.
The transient replication slot is ok because it should always be removed at the
end. However, the big question is how to detect that you are not removing
objects (publications, subscriptions, replication slots) from a successful
conversion.

Amit also asked about setup a logical replica with m databases where m is less
than the total number of databases. One option is to remove the "extra"
databases in the target server after promoting the physical replica or in one
of the latest steps. Maybe it is time to propose partial physical replica that
contains only a subset of databases on primary. (I'm not volunteering to it.)
Hence, pg_basebackup has an option to remove these "extra" databases so this
tool can take advantage of it.

Let's continue with the bike shedding... I agree with Peter E that this name
does not express what this tool is. At the moment, it only have one action:
create. If I have to suggest other actions I would say that it could support
switchover option too (that removes the infrastructure created by this tool).
If we decide to keep this name, it should be a good idea to add an option to
indicate what action it is executing (similar to pg_recvlogical) as suggested
by Peter.

I included the documentation cleanups that Peter E shared. I also did small
adjustments into the documentation. It probably deserves a warning section that
advertises about the cleanup.

I refactored the transient replication slot code and decided to use a permanent
(instead of temporary) slot to avoid keeping a replication connection open for
a long time until the target server catches up.

The system identifier functions (get_control_from_datadir() and
get_sysid_from_conn()) now returns uint64 as suggested by Peter.

After reflection, the --verbose option should be renamed to --progress. There
are also some messages that should be converted to debug messages.

I fixed the useless malloc. I rearrange the code a bit but the main still has ~
370 lines (without options/validation ~ 255 lines. I'm trying to rearrange the
code to make the code easier to read and at the same time reduce the main size.
I already have a few candidates in mind such as the code that stops the standby
and the part that includes the recovery parameters. I removed the refactor I
proposed in the previous patch and the current code is relying on pg_ctl --wait
behavior. Are there issues with this choice? Well, one annoying situation is
that pg_ctl does not have a "wait forever" option. If one of the pg_ctl calls
fails, you could probably have to start again (unless you understand the
pg_subscriber internals and fix the setup by yourself). You have to choose an
arbitrary timeout value and expect that pg_ctl *does* perform the action less
than the timeout.

Real tests are included. The cleanup code does not have coverage because a
simple reproducible case isn't easy. I'm also not sure if it is worth it. We
can explain it in the warning section that was proposed.

It is still a WIP but I would like to share it and get some feedback.


--
Euler Taveira
EDB   https://www.enterprisedb.com/
From 0aca46ed57c15bce1972c9db6459c04bcc5cf73d Mon Sep 17 00:00:00 2001
From: Euler Taveira 
Date: Mon, 5 Jun 2023 14:39:40 -0400
Subject: [PATCH v2] Creates a new logical replica from a standby server

A new tool called pg_subscriber can convert a physical replica into a
logical replica. It runs on the target server and should be able to
connect to the source server (publisher) and the target server
(subscriber).

The conversion requires a few steps. Check if the target data directory
has the same system identifier 

Re: Is it possible to change wal_level online

2023-09-14 Thread Euler Taveira
On Thu, Sep 14, 2023, at 7:05 AM, Andy Fan wrote:
> Currently it is complained that wal_level changes require an instance
> restart,  I'm not familiar with this stuff so far and I didn't get any good
> information from searching the email archive.  So I want to gather 
> some feedbacks from experts to  see if it is possible and if not, why
> it would be the key blocker for this.  Basically I agree that changing 
> the wal_level  online will be a good experience for users. 
> 

This topic was already discussed. See this thread [1] that was requesting to
change the wal_level default value. There might be other threads but I didn't
try hard to find them.


[1] 
https://www.postgresql.org/message-id/20200608213215.mgk3cctlzvfuaqm6%40alap3.anarazel.de


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: Support prepared statement invalidation when result types change

2023-09-13 Thread Euler Taveira
On Tue, Sep 12, 2023, at 10:17 AM, Jelte Fennema wrote:
> When running the Postgres JDBC tests with this patchset I found dumb
> mistake in my last patch where I didn't initialize the contents of
> orig_params correctly. This new patchset  fixes that.
> 
0001:

Don't you want to execute this code path only for EXECUTE command?
PORTAL_UTIL_SELECT includes other commands such as CALL, FETCH, SHOW, and
EXPLAIN. If so, check if commandTag is CMDTAG_EXECUTE.

Regarding tupDesc, you don't need to call FreeTupleDesc instead you can modify
PortalStart as

case PORTAL_UTIL_SELECT:

/*
 * We don't set snapshot here, because PortalRunUtility will
 * take care of it if needed.
 */
{
PlannedStmt *pstmt = PortalGetPrimaryStmt(portal);

Assert(pstmt->commandType == CMD_UTILITY);
/*
 * tupDesc will be filled by FillPortalStore later because
 * it might change due to replanning when ExecuteQuery calls
 * GetCachedPlan.
 */
if (portal->commandTag != CMDTAG_EXECUTE)
portal->tupDesc = 
UtilityTupleDescriptor(pstmt->utilityStmt);
}

Regarding the commit message, ...if the the result... should be fixed. The
sentence "it's actually not needed..." could be "It doesn't need to be an error
as long as it sends the RowDescription...". The sentence "This patch starts to
allow a prepared ..." could be "This patch allows a prepared ...".

0002:

You should remove this comment because it refers to the option you are
removing.

-  plan->cursor_options,
-  false);  /* not fixed result */
+  plan->cursor_options);   /* not fixed result */

You should also remove the sentence that refers to fixed_result in
CompleteCachedPlan.

* cursor_options: options bitmask to pass to planner
* fixed_result: true to disallow future changes in query's result tupdesc
*/
void
CompleteCachedPlan(CachedPlanSource *plansource,
   List *querytree_list,
   MemoryContext querytree_context,

0003:

You should initialize the new parameters (orig_param_types and orig_num_params)
in CreateCachedPlan. One suggestion is to move the following code to
CompleteCachedPlan because plansource->param_types are assigned there.

@@ -108,6 +108,10 @@ PrepareQuery(ParseState *pstate, PrepareStmt *stmt,

argtypes[i++] = toid;
}
+
+   plansource->orig_num_params = nargs;
+   plansource->orig_param_types = MemoryContextAlloc(plansource->context, 
nargs * sizeof(Oid));
+   memcpy(plansource->orig_param_types, argtypes, nargs * sizeof(Oid));
}

This comment is confusing. Since the new function
(GetCachedPlanFromRevalidated) contains almost all code from GetCachedPlan, its
content is the same as the *previous* GetCachedPlan function. You could expand
this comment a bit to make it clear that it contains the logic to decide
between generic x custom plan. I don't like the function name but have only a
not-so-good suggestion: GetCachedPlanAfterRevalidate. I also don't like the
revalidationResult as a variable name. Why don't you keep qlist? Or use a name
near query-tree list (query_list? qtlist?). s/te caller/the caller/

+ * GetCachedPlanFromRevalidated: is the same as get GetCachedPlan, but requires
+ * te caller to first revalidate the query. This is needed for callers that
+ * need to use the revalidated plan to generate boundParams.
+ */
+CachedPlan *
+GetCachedPlanFromRevalidated(CachedPlanSource *plansource,
+ParamListInfo boundParams,
+ResourceOwner owner,
+QueryEnvironment *queryEnv,
+List *revalidationResult)


Are these names accurate? The original are the current ones; new ones are
"runtime" data. It would be good to explain why a new array is required.

Oid*param_types;/* array of parameter type OIDs, or NULL */
int num_params; /* length of param_types array */
+   Oid*orig_param_types;   /* array of original parameter type OIDs,
+* or NULL */
+   int orig_num_params;/* length of orig_param_types array */

You should expand the commit message a bit. Explain this feature. Inform the
behavior change.


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: Pgoutput not capturing the generated columns

2023-08-01 Thread Euler Taveira
On Tue, Aug 1, 2023, at 3:47 AM, Rajendra Kumar Dangwal wrote:
> With decoderbufs and wal2json the connector is able to capture the generated 
> column `full_name` in above example. But with pgoutput the generated column 
> was not captured. 

wal2json materializes the generated columns before delivering the output. I
decided to materialized the generated columns in the output plugin because the
target consumers expects a complete row.

> Is this a known limitation of pgoutput plugin? If yes, where can we request 
> to add support for this feature?

I wouldn't say limitation but a design decision.

The logical replication design decides to compute the generated columns at
subscriber side. It was a wise decision aiming optimization (it doesn't
overload the publisher that is *already* in charge of logical decoding).

Should pgoutput provide a complete row? Probably. If it is an option that
defaults to false and doesn't impact performance.

The request for features should be done in this mailing list.


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: CDC/ETL system on top of logical replication with pgoutput, custom client

2023-07-31 Thread Euler Taveira
t;value":22}]}
0/36A05658 | 454542 | 
{"action":"I","schema":"public","table":"foo","columns":[{"name":"a","value":23}]}
0/36A057C0 | 454542 | {"action":"C"}
0/36A05258 | 454541 | {"action":"B"}
0/36A05258 | 454541 | 
{"action":"I","schema":"public","table":"foo","columns":[{"name":"a","value":11}]}
0/36A052D8 | 454541 | 
{"action":"I","schema":"public","table":"foo","columns":[{"name":"a","value":12}]}
0/36A05598 | 454541 | 
{"action":"I","schema":"public","table":"foo","columns":[{"name":"a","value":13}]}
0/36A057F0 | 454541 | {"action":"C"}
0/36A050C0 | 454540 | {"action":"B"}
0/36A050C0 | 454540 | 
{"action":"I","schema":"public","table":"foo","columns":[{"name":"a","value":1}]}
0/36A051A0 | 454540 | 
{"action":"I","schema":"public","table":"foo","columns":[{"name":"a","value":2}]}
0/36A054D8 | 454540 | 
{"action":"I","schema":"public","table":"foo","columns":[{"name":"a","value":3}]}
0/36A05820 | 454540 | {"action":"C"}
(17 rows)

Since session C committed first, it is the first transaction available to output
plugin (wal2json). Transaction 454541 is the next one that is available because
it committed after session C (transaction 454542) and the first transaction
that started (session A) is the last one available. You can also notice that
the first transaction (454540) is the last one available.

Your consumer cannot rely on LSN position or xid to track the progress.
Instead, Postgres provides a replication progress mechanism [1] to do it.


[1] https://www.postgresql.org/docs/current/replication-origins.html


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: logicalrep_message_type throws an error

2023-07-15 Thread Euler Taveira
On Sat, Jul 15, 2023, at 4:27 AM, Amit Kapila wrote:
> Do you have something like attached in mind?

WFM. I would change the comment that says

This function is called to provide context in the error ...

to

This message provides context in the error ...

because this comment is not at the beginning of the function but *before* the
message.


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: Allow specifying a dbname in pg_basebackup connection string

2023-07-05 Thread Euler Taveira
On Wed, Jul 5, 2023, at 9:43 AM, Thom Brown wrote:
> I guess my immediate question is, should backups be taken through
> PgBouncer?  It seems beyond PgBouncer's remit.

One of the PgBouncer's missions is to be a transparent proxy.

Sometimes you cannot reach out the database directly due to a security policy.
I've heard this backup question a few times. IMO if dbname doesn't matter for
reaching the server directly, I don't see a problem relaxing this restriction
to support this use case. We just need to document that dbname will be ignored
if specified. Other connection poolers might also benefit from it.


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: logicalrep_message_type throws an error

2023-07-05 Thread Euler Taveira
On Wed, Jul 5, 2023, at 7:56 AM, Alvaro Herrera wrote:
> On 2023-Jul-05, Amit Kapila wrote:
> 
> > I think after returning "???" from logicalrep_message_type(), the
> > above is possible when we get the error: "invalid logical replication
> > message type "X"" from apply_dispatch(), right? If so, then what about
> > the case when we forgot to handle some message in
> > logicalrep_message_type() but handled it in apply_dispatch()? Isn't it
> > better to return the 'action' from the function
> > logicalrep_message_type() for unknown type? That way the information
> > could be a bit better and we can easily catch the code bug as well.
> 
> Are you suggesting that logicalrep_message_type should include the
> numerical value of 'action' in the ??? message? Something like this:
> 
> ERROR:  invalid logical replication message type "X"
> CONTEXT:  processing remote data for replication origin "pg_16638" during 
> message type "??? (123)" in transaction 796, finished at 0/16266F8

Isn't this numerical value already exposed in the error message (X = 88)?
In this example, it is:

ERROR:  invalid logical replication message type "X"
CONTEXT:  processing remote data for replication origin "pg_16638" during 
message type "??? (88)" in transaction 796, finished at 0/1626698

IMO it could be confusing if we provide two representations of the same data (X
and 88). Since we already provide "X" I don't think we need "88". Another
option, is to remove "X" from apply_dispatch() and add "??? (88)" to 
logicalrep_message_type().

Opinions?


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: logicalrep_message_type throws an error

2023-07-03 Thread Euler Taveira
On Mon, Jul 3, 2023, at 10:57 AM, Ashutosh Bapat wrote:
> On Mon, Jul 3, 2023 at 6:52 PM Ashutosh Bapat
>  wrote:
> >
> > The switch is on action which is an enum. So without default it will
> > provide a compilation warning for missing enums. Adding "default" case
> > defeats that purpose. I think we should just return "???" from outside
> > switch block.
> >

Yeah. I don't think any gcc -Wswitch options have effect if a default is used
so your suggestion is good for wrong/missing message types in the future.

> PFA patch.

WFM.


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: logicalrep_message_type throws an error

2023-07-03 Thread Euler Taveira
On Mon, Jul 3, 2023, at 7:30 AM, Ashutosh Bapat wrote:
> logicalrep_message_type() is used to convert logical message type code
> into name while prepared error context or details. Thus when this
> function is called probably an ERROR is already raised. If
> logicalrep_message_type() gets an unknown message type, it will throw
> an error, which will suppress the error for which we are building
> context or details. That's not useful. I think instead
> logicalrep_message_type() should return "unknown" when it encounters
> an unknown message type and let the original error message be thrown
> as is.

Hmm. Good catch. The current behavior is:

ERROR:  invalid logical replication message type "X" 
LOG:  background worker "logical replication worker" (PID 71800) exited with 
exit code 1

... that hides the details. After providing a default message type:

ERROR:  invalid logical replication message type "X"
CONTEXT:  processing remote data for replication origin "pg_16638" during 
message type "???" in transaction 796, finished at 0/16266F8

Masahiko, since abc0910e2e0 is your patch maybe you want to take a look at it.


--
Euler Taveira
EDB   https://www.enterprisedb.com/
From aaeb2d7474a30a363b69441397b2d7dd91bfba30 Mon Sep 17 00:00:00 2001
From: Euler Taveira 
Date: Mon, 3 Jul 2023 08:54:10 -0300
Subject: [PATCH] uncover logical change details

The commit abc0910e2e0 adds logical change details to error context.
However, the function logicalrep_message_type() introduces an
elog(ERROR) that can hide these details. Instead, avoid elog() and use
??? (that is a synonym for unknown). Spotted by Ashutosh Bapat.

Discussion: https://postgr.es/m/CAExHW5suAEDW-mBZt_qu4RVxWZ1vL54-L%2Bci2zreYWebpzxYsA%40mail.gmail.com
---
 src/backend/replication/logical/proto.c | 8 +++-
 src/include/replication/logicalproto.h  | 2 +-
 2 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/src/backend/replication/logical/proto.c b/src/backend/replication/logical/proto.c
index f308713275..572ef0a1aa 100644
--- a/src/backend/replication/logical/proto.c
+++ b/src/backend/replication/logical/proto.c
@@ -1213,7 +1213,7 @@ logicalrep_read_stream_abort(StringInfo in,
 /*
  * Get string representing LogicalRepMsgType.
  */
-char *
+const char *
 logicalrep_message_type(LogicalRepMsgType action)
 {
 	switch (action)
@@ -1256,9 +1256,7 @@ logicalrep_message_type(LogicalRepMsgType action)
 			return "STREAM ABORT";
 		case LOGICAL_REP_MSG_STREAM_PREPARE:
 			return "STREAM PREPARE";
+		default:
+			return "???";
 	}
-
-	elog(ERROR, "invalid logical replication message type \"%c\"", action);
-
-	return NULL;/* keep compiler quiet */
 }
diff --git a/src/include/replication/logicalproto.h b/src/include/replication/logicalproto.h
index 0ea2df5088..c5be981eae 100644
--- a/src/include/replication/logicalproto.h
+++ b/src/include/replication/logicalproto.h
@@ -269,6 +269,6 @@ extern void logicalrep_write_stream_abort(StringInfo out, TransactionId xid,
 extern void logicalrep_read_stream_abort(StringInfo in,
 		 LogicalRepStreamAbortData *abort_data,
 		 bool read_abort_info);
-extern char *logicalrep_message_type(LogicalRepMsgType action);
+extern const char *logicalrep_message_type(LogicalRepMsgType action);
 
 #endif			/* LOGICAL_PROTO_H */
-- 
2.30.2



Re: drop table in transaction

2023-05-09 Thread Euler Taveira
On Tue, May 9, 2023, at 7:42 AM, Fabrice Chapuis wrote:
> Where in the code is written the mechanism used for isolation when drop table 
> is executed in a transaction 

RemoveRelations() in src/backend/commands/tablecmds.c

If you are looking for a previous layer, check ExecDropStmt().


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: Initial Schema Sync for Logical Replication

2023-03-24 Thread Euler Taveira
On Fri, Mar 24, 2023, at 8:57 AM, houzj.f...@fujitsu.com wrote:
> First, I think the current publisher doesn't know the version number of
> client(subscriber) so we need to check the feasibility of same. Also, having
> client's version number checks doesn't seem to be a good idea.

walrcv_server_version().

> Besides, I thought about the problems that will happen if we try to support
> replicating New PG to older PG. The following examples assume that we support 
> the
> DDL replication in the mentioned PG.
> 
> 1) Assume we want to replicate from a newer PG to a older PG where partition
>table has not been introduced. I think even if the publisher is aware of
>that, it doesn't have a good way to transform the partition related 
> command,
>maybe one could say we can transform that to inherit table, but I feel that
>introduces too much complexity.
> 
> 2) Another example is generated column. To replicate the newer PG which has
>this feature to a older PG without this. I am concerned that is there a way
>to transform this without causing inconsistent behavior.
> 
> Even if we decide to simply skip sending such unsupported commands or skip
> applying them, then it's likely that the following dml replication will cause
> data inconsistency.

As I mentioned in a previous email [1], the publisher can contain code to
decide if it can proceed or not, in case you are doing a downgrade. I said
downgrade but it can also happen if we decide to deprecate a syntax. For
example, when WITH OIDS was deprecated, pg_dump treats it as an acceptable
removal. The transformation can be (dis)allowed by the protocol version or
another constant [2].

> So, it seems we cannot completely support this use case, there would be some
> limitations. Personally, I am not sure if it's worth introducing complexity to
> support it partially.

Limitations are fine; they have different versions. I wouldn't like to forbid
downgrade just because I don't want to maintain compatibility with previous
versions. IMO it is important to be able to downgrade in case of any
incompatibility with an application. You might argue that this isn't possible
due to time or patch size and that there is a workaround for it but I wouldn't
want to close the door for downgrade in the future.

[1] 
https://www.postgresql.org/message-id/fb7894e4-b44e-4ae3-a74d-7c5650f69f1a%40app.fastmail.com
[2] 
https://www.postgresql.org/message-id/78149fa6-4c77-4128-8518-197a631c29c3%40app.fastmail.com


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)

2023-03-23 Thread Euler Taveira
On Thu, Mar 23, 2023, at 2:33 PM, Bharath Rupireddy wrote:
> On Thu, Mar 23, 2023 at 3:37 PM Alvaro Herrera  
> wrote:
> >
> > On 2023-Mar-22, Amit Kapila wrote:
> >
> > > I see that you have modified the patch to address the comments from
> > > Alvaro. Personally, I feel it would be better to add such a message at
> > > a centralized location instead of spreading these in different callers
> > > of slot acquire/release functionality to avoid getting these missed in
> > > the new callers in the future. However, if Alvaro and others think
> > > that the current style is better then we should go ahead and do it
> > > that way. I hope that we should be able to decide on this and get it
> > > into PG16. Anyone else would like to weigh in here?
> >
> > I like Peter Smith's suggestion downthread.
> 
> +1. Please review the attached v8 patch further.
If you are adding separate functions as suggested, you should add a comment at
the top of ReplicationSlotAcquire() and ReplicationSlotRelease() functions
saying that LogReplicationSlotAquired() and LogReplicationSlotReleased()
functions should be called  respectively after it.

My suggestion is that the functions should have the same name with a "Log"
prefix. On of them has a typo "Aquired" in its name. Hence,
LogReplicationSlotAcquire() and LogReplicationSlotRelease() as names. It is
easier to find if someone is grepping by the origin function.

I prefer a sentence that includes a verb.

  physical replication slot \"%s\" is acquired
  logical replication slot \"%s\" is released

Isn't the PID important for this use case? If so, of course, you can rely on
log_line_prefix (%p) but if the PID is crucial for an investigation then it
should also be included in the message.


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: Initial Schema Sync for Logical Replication

2023-03-23 Thread Euler Taveira
On Thu, Mar 23, 2023, at 8:44 AM, Amit Kapila wrote:
> On Thu, Mar 23, 2023 at 2:48 AM Euler Taveira  wrote:
> >
> > On Tue, Mar 21, 2023, at 8:18 AM, Amit Kapila wrote:
> >
> > Now, how do we avoid these problems even if we have our own version of
> > functionality similar to pg_dump for selected objects? I guess we will
> > face similar problems. If so, we may need to deny schema sync in any
> > such case.
> >
> > There are 2 approaches for initial DDL synchronization:
> >
> > 1) generate the DDL command on the publisher, stream it and apply it as-is 
> > on
> > the subscriber;
> > 2) generate a DDL representation (JSON, for example) on the publisher, 
> > stream
> > it, transform it into a DDL command on subscriber and apply it.
> >
> > The option (1) is simpler and faster than option (2) because it does not
> > require an additional step (transformation). However, option (2) is more
> > flexible than option (1) because it allow you to create a DDL command even 
> > if a
> > feature was removed from the subscriber and the publisher version is less 
> > than
> > the subscriber version or a feature was added to the publisher and the
> > publisher version is greater than the subscriber version.
> >
> 
> Is this practically possible? Say the publisher has a higher version
> that has introduced a new object type corresponding to which it has
> either a new catalog or some new columns in the existing catalog. Now,
> I don't think the older version of the subscriber can modify the
> command received from the publisher so that the same can be applied to
> the subscriber because it won't have any knowledge of the new feature.
> In the other case where the subscriber is of a newer version, we
> anyway should be able to support it with pg_dump as there doesn't
> appear to be any restriction with that, am, I missing something?
I think so (with some limitations). Since the publisher knows the subscriber
version, publisher knows that the subscriber does not contain the new object
type then publisher can decide if this case is critical (and reject the
replication) or optional (and silently not include the feature X -- because it
is not essential for logical replication). If required, the transformation
should be done on the publisher.

> Even if we decide to use deparse approach, it would still need to
> mimic stuff from pg_dump to construct commands based on only catalog
> contents. I am not against using this approach but we shouldn't ignore
> the duplicity required in this approach.
It is fine to share code between pg_dump and this new infrastructure. However,
the old code should coexist to support older versions because the new set of
functions don't exist in older server versions. Hence, duplicity should exist
for a long time (if you consider that the current policy is to allow dump from
9.2, we are talking about 10 years or so). There are some threads [1][2] that
discussed this topic: provide a SQL command based on the catalog
representation. You can probably find other discussions searching for "pg_dump
library" or "getddl".


[1] 
https://www.postgresql.org/message-id/flat/82EFF560-2A09-4C3D-81CC-A2A5EC438CE5%40eggerapps.at
[2] 
https://www.postgresql.org/message-id/71e01949.2e16b.13df4707405.coremail.shuai900...@126.com


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: Initial Schema Sync for Logical Replication

2023-03-22 Thread Euler Taveira
On Tue, Mar 21, 2023, at 8:18 AM, Amit Kapila wrote:
> Now, how do we avoid these problems even if we have our own version of
> functionality similar to pg_dump for selected objects? I guess we will
> face similar problems. If so, we may need to deny schema sync in any
> such case.
There are 2 approaches for initial DDL synchronization:

1) generate the DDL command on the publisher, stream it and apply it as-is on
the subscriber;
2) generate a DDL representation (JSON, for example) on the publisher, stream
it, transform it into a DDL command on subscriber and apply it.

The option (1) is simpler and faster than option (2) because it does not
require an additional step (transformation). However, option (2) is more
flexible than option (1) because it allow you to create a DDL command even if a
feature was removed from the subscriber and the publisher version is less than
the subscriber version or a feature was added to the publisher and the
publisher version is greater than the subscriber version. Of course there are
exceptions and it should forbid the transformation (in this case, it can be
controlled by the protocol version -- LOGICALREP_PROTO_FOOBAR_VERSION_NUM). A
decision must be made: simple/restrict vs complex/flexible.

One of the main use cases for logical replication is migration (X -> Y where X
< Y). Postgres generally does not remove features but it might happen (such as
WITH OIDS syntax) and it would break the DDL replication (option 1). In the
downgrade case (X -> Y where X > Y), it might break the DDL replication if a
new syntax is introduced in X. Having said that, IMO option (1) is fragile if
we want to support DDL replication between different Postgres versions. It
might eventually work but there is no guarantee.

Per discussion [1], I think if we agree that the Alvaro's DDL deparse patch is
the way to go with DDL replication, it seems wise that it should be used for
initial DDL synchronization as well.


[1] 
https://www.postgresql.org/message-id/CAA4eK1%2Bw_dFytBiv3RxbOL76_noMzmX0QGTc8uS%3Dbc2WaPVoow%40mail.gmail.com



--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: Initial Schema Sync for Logical Replication

2023-03-20 Thread Euler Taveira
On Mon, Mar 20, 2023, at 10:10 PM, Kumar, Sachin wrote:
> > From: Alvaro Herrera 
> > Subject: RE: [EXTERNAL]Initial Schema Sync for Logical Replication
> > On 2023-Mar-15, Kumar, Sachin wrote:
> > 
> > > 1. In  CreateSubscription()  when we create replication
> > > slot(walrcv_create_slot()), should use CRS_EXPORT_SNAPSHOT, So that we
> > can use this snapshot later in the pg_dump.
> > >
> > > 2.  Now we can call pg_dump with above snapshot from CreateSubscription.
> > 
> > Overall I'm not on board with the idea that logical replication would 
> > depend on
> > pg_dump; that seems like it could run into all sorts of trouble (what if 
> > calling
> > external binaries requires additional security setup?  what about pg_hba
> > connection requirements? what about max_connections in tight
> > circumstances?).
> > what if calling external binaries requires additional security setup
> I am not sure what kind of security restriction would apply in this case, 
> maybe pg_dump
> binary can be changed ? 
Using pg_dump as part of this implementation is not acceptable because we
expect the backend to be decoupled from the client. Besides that, pg_dump
provides all table dependencies (such as tablespaces, privileges, security
labels, comments); not all dependencies shouldn't be replicated. You should
exclude them removing these objects from the TOC before running pg_restore or
adding a few pg_dump options to exclude these objects. Another issue is related
to different version. Let's say the publisher has a version ahead of the
subscriber version, a new table syntax can easily break your logical
replication setup. IMO pg_dump doesn't seem like a good solution for initial
synchronization.

Instead, the backend should provide infrastructure to obtain the required DDL
commands for the specific (set of) tables. This can work around the issues from
the previous paragraph:

* you can selectively choose dependencies;
* don't require additional client packages;
* don't need to worry about different versions.

This infrastructure can also be useful for other use cases such as:

* client tools that provide create commands (such as psql, pgAdmin);
* other logical replication solutions;
* other logical backup solutions.


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: Allow logical replication to copy tables in binary format

2023-03-15 Thread Euler Taveira
On Wed, Mar 8, 2023, at 11:50 PM, Amit Kapila wrote:
> On Wed, Mar 8, 2023 at 6:17 PM Melih Mutlu  wrote:
> >
> > On 7 Mar 2023 Tue at 04:10 Amit Kapila  wrote:
> >>
> >> As per what I could read in this thread, most people prefer to use the
> >> existing binary option rather than inventing a new way (option) to
> >> binary copy in the initial sync phase. Do you agree?
> >
> >
> > I agree.
> > What do you think about the version checks? I removed any kind of check 
> > since it’s currently a different option. Should we check publisher version 
> > before doing binary copy to ensure that the publisher node supports binary 
> > option of COPY command?
> >
> 
> It is not clear to me which version check you wanted to add because we
> seem to have a binary option in COPY from the time prior to logical
> replication. I feel we need a publisher version 14 check as that is
> where we start to support binary mode transfer in logical replication.
> See the check in function libpqrcv_startstreaming().
... then you are breaking existent cases. Even if you have a convincing
argument, you are introducing a behavior change in prior versions (commit
messages should always indicate that you are breaking backward compatibility).

+
+   /*
+* The binary option for replication is supported since v14
+*/
+   if (walrcv_server_version(LogRepWorkerWalRcvConn) >= 14 &&
+   MySubscription->binary)
+   {
+   appendStringInfo(, " WITH (FORMAT binary)");
+   options = lappend(options, makeDefElem("format", (Node *) 
makeString("binary"), -1));
+   }
+

What are the arguments to support since v14 instead of the to-be-released
version? I read the thread but it is not clear. It was said about the
restrictive nature of this feature and it will be frustrating to see that the
same setup (with the same commands) works with v14 and v15 but it doesn't with
v16. IMO it should be >= 16 and documentation should explain that v14/v15 uses
text format during initial table synchronization even if binary = true.

Should there be a fallback mode (text) if initial table synchronization failed
because of the binary option? Maybe a different setting (auto) to support such
behavior.


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: zstd compression for pg_dump

2023-02-25 Thread Euler Taveira
On Sat, Feb 25, 2023, at 7:31 AM, Tomas Vondra wrote:
> On 2/24/23 20:18, Justin Pryzby wrote:
> > This is a draft patch - review is welcome and would help to get this
> > ready to be considererd for v16, if desired.
> > 
> > I'm going to add this thread to the old CF entry.
> > https://commitfest.postgresql.org/31/2888/
> > 
> 
> Thanks. Sadly cfbot is unhappy - the windows and cplusplus builds failed
> because of some issue in pg_backup_archiver.h. But it's a bit bizarre
> because the patch does not modify that file at all ...
cpluspluscheck says

# pg_dump is not C++-clean because it uses "public" and "namespace"
# as field names, which is unfortunate but we won't change it now.

Hence, the patch should exclude the new header file from it.

--- a/src/tools/pginclude/cpluspluscheck
+++ b/src/tools/pginclude/cpluspluscheck
@@ -153,6 +153,7 @@ do
test "$f" = src/bin/pg_dump/compress_gzip.h && continue
test "$f" = src/bin/pg_dump/compress_io.h && continue
test "$f" = src/bin/pg_dump/compress_lz4.h && continue
+   test "$f" = src/bin/pg_dump/compress_zstd.h && continue
test "$f" = src/bin/pg_dump/compress_none.h && continue
test "$f" = src/bin/pg_dump/parallel.h && continue
test "$f" = src/bin/pg_dump/pg_backup_archiver.h && continue


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: Time delayed LR (WAS Re: logical replication restrictions)

2023-01-23 Thread Euler Taveira
ious code because it provides a unit for external representation. I
understand that using the same representation as recovery_min_apply_delay is
good but the current code does not handle the external representation
accordingly. (recovery_min_apply_delay uses the GUC machinery to adds the unit
but for min_apply_delay, it doesn't).

# Setup for streaming case
-$node_publisher->append_conf('postgres.conf',
+$node_publisher->append_conf('postgresql.conf',
'logical_decoding_mode = immediate');
$node_publisher->reload;

Fix configuration file name.

Maybe tests should do a better job. I think check_apply_delay_time is fragile
because it does not guarantee that time is not shifted. Time-delayed
replication is a subscriber feature and to check its correctness it should
check the logs.

# Note that we cannot call check_apply_delay_log() here because there is a
# possibility that the delay is skipped. The event happens when the WAL
# replication between publisher and subscriber is delayed due to a mechanical
# problem. The log output will be checked later - substantial delay-time case.

If you might not use the logs for it, it should adjust the min_apply_delay, no?

It does not exercise the min_apply_delay vs parallel streaming mode.

+   /*
+* The combination of parallel streaming mode and
+* min_apply_delay is not allowed.
+*/
+   if (opts.streaming == LOGICALREP_STREAM_PARALLEL)
+   if ((IsSet(opts.specified_opts, SUBOPT_MIN_APPLY_DELAY) 
&& opts.min_apply_delay > 0) ||
+   (!IsSet(opts.specified_opts, 
SUBOPT_MIN_APPLY_DELAY) && sub->minapplydelay > 0))
+   ereport(ERROR,
+   
errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+   errmsg("cannot enable %s mode for 
subscription with %s",
+  "streaming = parallel", 
"min_apply_delay"));
+

Is this code correct? I also didn't like this message. "cannot enable streaming
= parallel mode for subscription with min_apply_delay" is far from a good error
message. How about refer parallelism to "parallel streaming mode".


--
Euler Taveira
EDB   https://www.enterprisedb.com/
From 5024325284ee3b4a4dc0a6a1cc6457ed5608cb46 Mon Sep 17 00:00:00 2001
From: Euler Taveira 
Date: Mon, 23 Jan 2023 15:52:55 -0300
Subject: [PATCH] Euler's review

---
 doc/src/sgml/catalogs.sgml |   2 +-
 doc/src/sgml/config.sgml   |  20 ++-
 doc/src/sgml/logical-replication.sgml  |   7 +-
 doc/src/sgml/ref/create_subscription.sgml  |  13 +-
 src/backend/commands/subscriptioncmds.c|  13 +-
 src/backend/replication/logical/worker.c   |  40 +++---
 src/bin/psql/describe.c|   2 +-
 src/test/regress/expected/subscription.out | 160 ++---
 src/test/subscription/t/032_apply_delay.pl |   8 +-
 9 files changed, 133 insertions(+), 132 deletions(-)

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index bf3c05241c..0bdb683296 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -7878,7 +7878,7 @@ SCRAM-SHA-256$iteration count:
subminapplydelay int8
   
   
-   The length of time (ms) to delay the application of changes.
+   Total time spent delaying the application of changes, in milliseconds
   
  
 
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 39244bf64a..a15723d74f 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -4788,17 +4788,15 @@ ANY num_sync (  for details.
+   For time-delayed logical replication, the apply worker sends a feedback
+   message to the publisher every
+   wal_receiver_status_interval milliseconds. Make sure
+   to set wal_receiver_status_interval less than the
+   wal_sender_timeout on the publisher, otherwise, the
+   walsender will repeatedly terminate due to timeout
+   error. If wal_receiver_status_interval is set to
+   zero, the apply worker doesn't send any feedback messages during the
+   min_apply_delay interval.
   
   
  
diff --git a/doc/src/sgml/logical-replication.sgml b/doc/src/sgml/logical-replication.sgml
index 863af11a47..d8ae93f88d 100644
--- a/doc/src/sgml/logical-replication.sgml
+++ b/doc/src/sgml/logical-replication.sgml
@@ -248,10 +248,9 @@
   
 
   
-   The subscriber replication can be instructed to lag behind the publisher
-   side changes by specifying the min_apply_delay
-   subscription parameter. See  for
-   details.
+   A logical replication subscription can delay the application of changes by
+   specifying the min_apply_delay subscription parameter.
+   See  for details.
   
 
   
diff --git a/doc/src/sgml/ref/create_subscription.sgml b/doc/src/sgml/ref/create_subsc

  1   2   3   4   >