On Fri, Feb 23, 2024 at 3:05 PM Amit Kapila <amit.kapil...@gmail.com> wrote: > > On Wed, Feb 21, 2024 at 7:46 AM Hayato Kuroda (Fujitsu) > <kuroda.hay...@fujitsu.com> wrote: > > > > > > Just FYI - here is an extreme case. And note that I have applied > > > > proposed patch. > > > > > > > > When `pg_basebackup -D data_N2 -R` is used: > > > > ``` > > > > primary_conninfo = 'user=hayato ... dbname=hayato ... > > > > ``` > > > > > > > > But when `pg_basebackup -d "" -D data_N2 -R` is used: > > > > ``` > > > > primary_conninfo = 'user=hayato ... dbname=replication > > > > ``` > > > > > > It seems like maybe somebody should look into why this is happening, > > > and perhaps fix it. > > > > I think this caused from below part [1] in GetConnection(). > > > > If both dbname and connection_string are the NULL, we will enter the else > > part > > and NULL would be substituted - {"dbnmae", NULL} key-value pair is generated > > only here. > > > > Then, in PQconnectdbParams()->PQconnectStartParams->pqConnectOptions2(), > > the strange part would be found and replaced to the username [2]. > > > > I think if both the connection string and the dbname are NULL, it should be > > considered as the physical replication connection. here is a patch to fix > > it. > > > > When dbname is NULL or not given, it defaults to username. This > follows the specs of the connection string.
This fact makes me think that the slotsync worker might be able to accept the primary_conninfo value even if there is no dbname in the value. That is, if there is no dbname in the primary_conninfo, it uses the username in accordance with the specs of the connection string. Currently, the slotsync worker connects to the local database first and then establishes the connection to the primary server. But if we can reverse the two steps, it can get the dbname that has actually been used to establish the remote connection and use it for the local connection too. That way, the primary_conninfo generated by pg_basebackup could work even without the patch. For example, if the OS user executing pg_basebackup is 'postgres', the slotsync worker would connect to the postgres database. Given the 'postgres' database is created by default and 'postgres' OS user is used in common, I guess it could cover many cases in practice actually. Having said that, even with (or without) the above change, we might want to change the pg_basebackup so that it writes the dbname to the primary_conninfo if -R option is specified. Since the database where the slotsync worker connects cannot be dropped while the slotsync worker is running, the user might want to change the database to connect, and it would be useful if they can do that using pg_basebackup instead of modifying the configuration file manually. While the current approach makes sense to me, I'm a bit concerned that we might end up having the pg_basebackup search the actual database name (e.g. 'dbname=template1') from the .pgpass file instead of 'dbname=replication'. As far as I tested on my environment, suppose that I execute: pg_basebackup -D tmp -d "dbname=testdb" -R The pg_basebackup established a replication connection but looked for the password of the 'testdb' database. This could be another inconvenience for the existing users who want to use the slot synchronization. A random idea I came up with is, we add a new option to the pg_basebackup to overwrite the full or some portion of the connection string that is eventually written in the primary_conninfo in postgresql.auto.conf. For example, the command: pg_basebackup -D tmp -d "host=1.1.1.1 port=5555" -R --primary-coninfo-ext "host=2.2.2.2 dbname=postgres" will produce the connection string that is based on -d option value but is overwritten by --primary-conninfo-ext option value, which will be like: host=2.2.2.2 dbname=postgres port=5555 This option might help not only for users who want to use the slotsync worker but also for users who want to take a basebackup from a standby but have the new standby connect to the primary. But it's still just an idea and I might be missing something. And given we're getting closer to the feature freeze, it would be a PG18 item. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com