Re: Synchronizing slots from primary to standby

2024-05-22 Thread Peter Smith
Here are some review comments for the docs patch v3-0001. == Commit message 1. This patch adds detailed documentation for the slot sync feature including examples to guide users on how to verify that all slots have been successfully synchronized to the standby server and how to confirm

Re: Synchronizing slots from primary to standby

2024-05-08 Thread Bertrand Drouvot
Hi, On Mon, Apr 29, 2024 at 11:58:09AM +, Zhijie Hou (Fujitsu) wrote: > On Monday, April 29, 2024 5:11 PM shveta malik wrote: > > > > On Mon, Apr 29, 2024 at 11:38 AM shveta malik > > wrote: > > > > > > On Mon, Apr 29, 2024 at 10:57 AM Zhijie Hou (Fujitsu) > > > wrote: > > > > > > > > On

Re: Synchronizing slots from primary to standby

2024-04-29 Thread shveta malik
On Mon, Apr 29, 2024 at 5:28 PM Zhijie Hou (Fujitsu) wrote: > > On Monday, April 29, 2024 5:11 PM shveta malik wrote: > > > > On Mon, Apr 29, 2024 at 11:38 AM shveta malik > > wrote: > > > > > > On Mon, Apr 29, 2024 at 10:57 AM Zhijie Hou (Fujitsu) > > > wrote: > > > > > > > > On Friday, March

RE: Synchronizing slots from primary to standby

2024-04-29 Thread Zhijie Hou (Fujitsu)
On Monday, April 29, 2024 5:11 PM shveta malik wrote: > > On Mon, Apr 29, 2024 at 11:38 AM shveta malik > wrote: > > > > On Mon, Apr 29, 2024 at 10:57 AM Zhijie Hou (Fujitsu) > > wrote: > > > > > > On Friday, March 15, 2024 10:45 PM Bertrand Drouvot > wrote: > > > > > > > > Hi, > > > > > > >

Re: Synchronizing slots from primary to standby

2024-04-29 Thread shveta malik
On Mon, Apr 29, 2024 at 11:38 AM shveta malik wrote: > > On Mon, Apr 29, 2024 at 10:57 AM Zhijie Hou (Fujitsu) > wrote: > > > > On Friday, March 15, 2024 10:45 PM Bertrand Drouvot > > wrote: > > > > > > Hi, > > > > > > On Thu, Mar 14, 2024 at 02:22:44AM +, Zhijie Hou (Fujitsu) wrote: > > >

Re: Synchronizing slots from primary to standby

2024-04-29 Thread shveta malik
On Mon, Apr 29, 2024 at 11:38 AM shveta malik wrote: > > On Mon, Apr 29, 2024 at 10:57 AM Zhijie Hou (Fujitsu) > wrote: > > > > On Friday, March 15, 2024 10:45 PM Bertrand Drouvot > > wrote: > > > > > > Hi, > > > > > > On Thu, Mar 14, 2024 at 02:22:44AM +, Zhijie Hou (Fujitsu) wrote: > > >

Re: Synchronizing slots from primary to standby

2024-04-29 Thread shveta malik
On Mon, Apr 29, 2024 at 10:57 AM Zhijie Hou (Fujitsu) wrote: > > On Friday, March 15, 2024 10:45 PM Bertrand Drouvot > wrote: > > > > Hi, > > > > On Thu, Mar 14, 2024 at 02:22:44AM +, Zhijie Hou (Fujitsu) wrote: > > > Hi, > > > > > > Since the standby_slot_names patch has been committed, I

RE: Synchronizing slots from primary to standby

2024-04-28 Thread Zhijie Hou (Fujitsu)
On Friday, March 15, 2024 10:45 PM Bertrand Drouvot wrote: > > Hi, > > On Thu, Mar 14, 2024 at 02:22:44AM +, Zhijie Hou (Fujitsu) wrote: > > Hi, > > > > Since the standby_slot_names patch has been committed, I am attaching > > the last doc patch for review. > > > > Thanks! > > 1 === > >

RE: Synchronizing slots from primary to standby

2024-04-12 Thread Zhijie Hou (Fujitsu)
On Friday, April 12, 2024 11:31 AM Amit Kapila wrote: > > On Thu, Apr 11, 2024 at 5:04 PM Zhijie Hou (Fujitsu) > wrote: > > > > On Thursday, April 11, 2024 12:11 PM Amit Kapila > wrote: > > > > > > > > 2. > > > - if (remote_slot->restart_lsn < slot->data.restart_lsn) > > > + if

Re: Synchronizing slots from primary to standby

2024-04-11 Thread Amit Kapila
On Thu, Apr 11, 2024 at 5:04 PM Zhijie Hou (Fujitsu) wrote: > > On Thursday, April 11, 2024 12:11 PM Amit Kapila > wrote: > > > > > 2. > > - if (remote_slot->restart_lsn < slot->data.restart_lsn) > > + if (remote_slot->confirmed_lsn < slot->data.confirmed_flush) > > elog(ERROR, > > "cannot

RE: Synchronizing slots from primary to standby

2024-04-11 Thread Zhijie Hou (Fujitsu)
On Thursday, April 11, 2024 12:11 PM Amit Kapila wrote: > > On Wed, Apr 10, 2024 at 5:28 PM Zhijie Hou (Fujitsu) > wrote: > > > > On Thursday, April 4, 2024 5:37 PM Amit Kapila > wrote: > > > > > > BTW, while thinking on this one, I > > > noticed that in the function

Re: Synchronizing slots from primary to standby

2024-04-10 Thread Amit Kapila
On Wed, Apr 10, 2024 at 5:28 PM Zhijie Hou (Fujitsu) wrote: > > On Thursday, April 4, 2024 5:37 PM Amit Kapila > wrote: > > > > BTW, while thinking on this one, I > > noticed that in the function LogicalConfirmReceivedLocation(), we first > > update > > the disk copy, see comment [1] and then

RE: Synchronizing slots from primary to standby

2024-04-10 Thread Zhijie Hou (Fujitsu)
On Thursday, April 4, 2024 5:37 PM Amit Kapila wrote: > > BTW, while thinking on this one, I > noticed that in the function LogicalConfirmReceivedLocation(), we first update > the disk copy, see comment [1] and then in-memory whereas the same is not > true in > update_local_synced_slot() for

RE: Synchronizing slots from primary to standby

2024-04-09 Thread Zhijie Hou (Fujitsu)
On Thursday, April 4, 2024 4:25 PM Masahiko Sawada wrote: Hi, > On Wed, Apr 3, 2024 at 7:06 PM Amit Kapila > wrote: > > > > On Wed, Apr 3, 2024 at 11:13 AM Amit Kapila > wrote: > > > > > > On Wed, Apr 3, 2024 at 9:36 AM Bharath Rupireddy > > > wrote: > > > > > > > I quickly looked at v8,

Re: Synchronizing slots from primary to standby

2024-04-08 Thread Amit Kapila
On Mon, Apr 8, 2024 at 7:01 PM Zhijie Hou (Fujitsu) wrote: > > Thanks for pushing. > > I checked the BF status, and noticed one BF failure, which I think is related > to > a miss in the test code. > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=adder=2024-04-08%2012%3A04%3A27 > > From

Re: Synchronizing slots from primary to standby

2024-04-08 Thread Amit Kapila
On Mon, Apr 8, 2024 at 9:49 PM Andres Freund wrote: > > On 2024-04-08 16:01:41 +0530, Amit Kapila wrote: > > Pushed. > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=adder=2024-04-08%2012%3A04%3A27 > > This unfortunately is a commit after > Right, and thanks for the report. Hou-San

Re: Synchronizing slots from primary to standby

2024-04-08 Thread Andres Freund
Hi, On 2024-04-08 16:01:41 +0530, Amit Kapila wrote: > Pushed. https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=adder=2024-04-08%2012%3A04%3A27 This unfortunately is a commit after commit 6f3d8d5e7cc Author: Amit Kapila Date: 2024-04-08 13:21:55 +0530 Fix the intermittent

RE: Synchronizing slots from primary to standby

2024-04-08 Thread Zhijie Hou (Fujitsu)
On Monday, April 8, 2024 6:32 PM Amit Kapila wrote: > > On Mon, Apr 8, 2024 at 12:19 PM Zhijie Hou (Fujitsu) > wrote: > > > > On Saturday, April 6, 2024 12:43 PM Amit Kapila > wrote: > > > On Fri, Apr 5, 2024 at 8:05 PM Bertrand Drouvot > > > wrote: > > > > > > Yeah, that could be the first

Re: Synchronizing slots from primary to standby

2024-04-08 Thread Amit Kapila
On Mon, Apr 8, 2024 at 12:19 PM Zhijie Hou (Fujitsu) wrote: > > On Saturday, April 6, 2024 12:43 PM Amit Kapila > wrote: > > On Fri, Apr 5, 2024 at 8:05 PM Bertrand Drouvot > > wrote: > > > > Yeah, that could be the first step. We can probably add an injection point > > to > > control the

RE: Synchronizing slots from primary to standby

2024-04-08 Thread Zhijie Hou (Fujitsu)
On Saturday, April 6, 2024 12:43 PM Amit Kapila wrote: > On Fri, Apr 5, 2024 at 8:05 PM Bertrand Drouvot > wrote: > > > > On Fri, Apr 05, 2024 at 06:23:10PM +0530, Amit Kapila wrote: > > > On Fri, Apr 5, 2024 at 5:17 PM Amit Kapila > wrote: > > > Thinking more on this, it doesn't seem related

Re: Synchronizing slots from primary to standby

2024-04-07 Thread Amit Kapila
On Sun, Apr 7, 2024 at 3:06 AM Andres Freund wrote: > > On 2024-04-06 10:58:32 +0530, Amit Kapila wrote: > > On Sat, Apr 6, 2024 at 10:13 AM Amit Kapila wrote: > > > > > > > There are still a few pending issues to be fixed in this feature but > > otherwise, we have committed all the main

Re: Synchronizing slots from primary to standby

2024-04-06 Thread Andres Freund
Hi, On 2024-04-06 10:58:32 +0530, Amit Kapila wrote: > On Sat, Apr 6, 2024 at 10:13 AM Amit Kapila wrote: > > > > There are still a few pending issues to be fixed in this feature but > otherwise, we have committed all the main patches, so I marked the CF > entry corresponding to this work as

Re: Synchronizing slots from primary to standby

2024-04-06 Thread Bertrand Drouvot
Hi, On Sat, Apr 06, 2024 at 10:13:00AM +0530, Amit Kapila wrote: > On Fri, Apr 5, 2024 at 8:05 PM Bertrand Drouvot > wrote: > > I think the new LSN can be visible only when the corresponding WAL is > written by XLogWrite(). I don't know what in XLogSetAsyncXactLSN() can > make it visible. In

Re: Synchronizing slots from primary to standby

2024-04-05 Thread Amit Kapila
On Sat, Apr 6, 2024 at 10:13 AM Amit Kapila wrote: > There are still a few pending issues to be fixed in this feature but otherwise, we have committed all the main patches, so I marked the CF entry corresponding to this work as committed. -- With Regards, Amit Kapila.

Re: Synchronizing slots from primary to standby

2024-04-05 Thread Amit Kapila
On Fri, Apr 5, 2024 at 8:05 PM Bertrand Drouvot wrote: > > On Fri, Apr 05, 2024 at 06:23:10PM +0530, Amit Kapila wrote: > > On Fri, Apr 5, 2024 at 5:17 PM Amit Kapila wrote: > > Thinking more on this, it doesn't seem related to > > c9920a9068eac2e6c8fb34988d18c0b42b9bf811 as that commit doesn't

Re: Synchronizing slots from primary to standby

2024-04-05 Thread Bertrand Drouvot
Hi, On Fri, Apr 05, 2024 at 02:35:42PM +, Bertrand Drouvot wrote: > I think that maybe as a first step we should move the "elog(DEBUG2," message > as > proposed above to help debugging (that could help to confirm the above > theory). If you agree and think that makes sense, pleae find

Re: Synchronizing slots from primary to standby

2024-04-05 Thread Bertrand Drouvot
Hi, On Fri, Apr 05, 2024 at 06:23:10PM +0530, Amit Kapila wrote: > On Fri, Apr 5, 2024 at 5:17 PM Amit Kapila wrote: > Thinking more on this, it doesn't seem related to > c9920a9068eac2e6c8fb34988d18c0b42b9bf811 as that commit doesn't change > any locking or something like that which impacts

Re: Synchronizing slots from primary to standby

2024-04-05 Thread Amit Kapila
On Fri, Apr 5, 2024 at 5:17 PM Amit Kapila wrote: > > On Thu, Apr 4, 2024 at 2:59 PM shveta malik wrote: > > > > There is an intermittent BF failure observed at [1] after this commit > > (2ec005b). > > > > Thanks for analyzing and providing the patch. I'll look into it. There > is another BF

Re: Synchronizing slots from primary to standby

2024-04-05 Thread Amit Kapila
On Thu, Apr 4, 2024 at 2:59 PM shveta malik wrote: > > There is an intermittent BF failure observed at [1] after this commit > (2ec005b). > Thanks for analyzing and providing the patch. I'll look into it. There is another BF failure [1] which I have analyzed. The main reason for failure is the

Re: Synchronizing slots from primary to standby

2024-04-05 Thread shveta malik
On Fri, Apr 5, 2024 at 4:31 PM Bertrand Drouvot wrote: > > BTW, I just realized that the LSN I used in my example in the > LSN_FORMAT_ARGS() > are not the right ones. Noted. Thanks. Please find v3 with the comments addressed. thanks Shveta

Re: Synchronizing slots from primary to standby

2024-04-05 Thread Bertrand Drouvot
Hi, On Fri, Apr 05, 2024 at 04:09:01PM +0530, shveta malik wrote: > On Fri, Apr 5, 2024 at 10:09 AM Bertrand Drouvot > wrote: > > > > What about something like? > > > > ereport(LOG, > > errmsg("synchronized confirmed_flush_lsn for slot \"%s\" differs > > from remote slot", > >

Re: Synchronizing slots from primary to standby

2024-04-05 Thread shveta malik
On Fri, Apr 5, 2024 at 10:09 AM Bertrand Drouvot wrote: > > What about something like? > > ereport(LOG, > errmsg("synchronized confirmed_flush_lsn for slot \"%s\" differs from > remote slot", > remote_slot->name), > errdetail("Remote slot has LSN %X/%X but local slot has

Re: Synchronizing slots from primary to standby

2024-04-04 Thread Bertrand Drouvot
Hi, On Fri, Apr 05, 2024 at 09:43:35AM +0530, shveta malik wrote: > On Fri, Apr 5, 2024 at 9:22 AM Bertrand Drouvot > wrote: > > > > Hi, > > > > On Thu, Apr 04, 2024 at 05:31:45PM +0530, shveta malik wrote: > > > On Thu, Apr 4, 2024 at 2:59 PM shveta malik > > > wrote: > > 2 === > > > > +

Re: Synchronizing slots from primary to standby

2024-04-04 Thread shveta malik
On Fri, Apr 5, 2024 at 9:22 AM Bertrand Drouvot wrote: > > Hi, > > On Thu, Apr 04, 2024 at 05:31:45PM +0530, shveta malik wrote: > > On Thu, Apr 4, 2024 at 2:59 PM shveta malik wrote: > > > > > > > > > Prior to commit 2ec005b, this check was okay, as we did not expect > > > restart_lsn of the

Re: Synchronizing slots from primary to standby

2024-04-04 Thread Bertrand Drouvot
Hi, On Thu, Apr 04, 2024 at 05:31:45PM +0530, shveta malik wrote: > On Thu, Apr 4, 2024 at 2:59 PM shveta malik wrote: > > > > > > Prior to commit 2ec005b, this check was okay, as we did not expect > > restart_lsn of the synced slot to be ahead of remote since we were > > directly copying the

Re: Synchronizing slots from primary to standby

2024-04-04 Thread shveta malik
On Thu, Apr 4, 2024 at 2:59 PM shveta malik wrote: > > > Prior to commit 2ec005b, this check was okay, as we did not expect > restart_lsn of the synced slot to be ahead of remote since we were > directly copying the lsns. But now when we use 'advance' to do logical > decoding on standby, there is

Re: Synchronizing slots from primary to standby

2024-04-04 Thread Amit Kapila
On Thu, Apr 4, 2024 at 1:55 PM Masahiko Sawada wrote: > > While testing this change, I realized that it could happen that the > server logs are flooded with the following logical decoding logs that > are written every 200 ms: > > 2024-04-04 16:15:19.270 JST [3838739] LOG: starting logical

Re: Synchronizing slots from primary to standby

2024-04-04 Thread shveta malik
On Wed, Apr 3, 2024 at 3:36 PM Amit Kapila wrote: > > On Wed, Apr 3, 2024 at 11:13 AM Amit Kapila wrote: > > > > On Wed, Apr 3, 2024 at 9:36 AM Bharath Rupireddy > > wrote: > > > > > I quickly looked at v8, and have a nit, rest all looks good. > > > > > > +if (DecodingContextReady(ctx)

Re: Synchronizing slots from primary to standby

2024-04-04 Thread Masahiko Sawada
On Wed, Apr 3, 2024 at 7:06 PM Amit Kapila wrote: > > On Wed, Apr 3, 2024 at 11:13 AM Amit Kapila wrote: > > > > On Wed, Apr 3, 2024 at 9:36 AM Bharath Rupireddy > > wrote: > > > > > I quickly looked at v8, and have a nit, rest all looks good. > > > > > > +if (DecodingContextReady(ctx)

Re: Synchronizing slots from primary to standby

2024-04-03 Thread Amit Kapila
On Wed, Apr 3, 2024 at 11:13 AM Amit Kapila wrote: > > On Wed, Apr 3, 2024 at 9:36 AM Bharath Rupireddy > wrote: > > > I quickly looked at v8, and have a nit, rest all looks good. > > > > +if (DecodingContextReady(ctx) && found_consistent_snapshot) > > +

Re: Synchronizing slots from primary to standby

2024-04-02 Thread Amit Kapila
On Wed, Apr 3, 2024 at 9:36 AM Bharath Rupireddy wrote: > > On Wed, Apr 3, 2024 at 9:04 AM Amit Kapila wrote: > > > > > I'd just rename LogicalSlotAdvanceAndCheckSnapState(XLogRecPtr > > > moveto, bool *found_consistent_snapshot) to > > > pg_logical_replication_slot_advance(XLogRecPtr moveto,

Re: Synchronizing slots from primary to standby

2024-04-02 Thread Bharath Rupireddy
On Wed, Apr 3, 2024 at 9:04 AM Amit Kapila wrote: > > > I'd just rename LogicalSlotAdvanceAndCheckSnapState(XLogRecPtr > > moveto, bool *found_consistent_snapshot) to > > pg_logical_replication_slot_advance(XLogRecPtr moveto, bool > > *found_consistent_snapshot) and use it. If others don't like

Re: Synchronizing slots from primary to standby

2024-04-02 Thread Amit Kapila
On Tue, Apr 2, 2024 at 7:42 PM Bharath Rupireddy wrote: > > On Tue, Apr 2, 2024 at 7:25 PM Zhijie Hou (Fujitsu) > wrote: > > > > > 1. Can we just remove pg_logical_replication_slot_advance and use > > > LogicalSlotAdvanceAndCheckSnapState instead? If worried about the > > > function naming,

Re: Synchronizing slots from primary to standby

2024-04-02 Thread Bharath Rupireddy
On Tue, Apr 2, 2024 at 7:25 PM Zhijie Hou (Fujitsu) wrote: > > > 1. Can we just remove pg_logical_replication_slot_advance and use > > LogicalSlotAdvanceAndCheckSnapState instead? If worried about the > > function naming, LogicalSlotAdvanceAndCheckSnapState can be renamed to > >

RE: Synchronizing slots from primary to standby

2024-04-02 Thread Zhijie Hou (Fujitsu)
On Tuesday, April 2, 2024 8:49 PM Bharath Rupireddy wrote: > > On Tue, Apr 2, 2024 at 2:11 PM Zhijie Hou (Fujitsu) > wrote: > > > > CFbot[1] complained about one query result's order in the tap-test, so I am > > attaching a V7 patch set which fixed this. There are no changes in 0001. > > > >

Re: Synchronizing slots from primary to standby

2024-04-02 Thread Bharath Rupireddy
On Tue, Apr 2, 2024 at 2:11 PM Zhijie Hou (Fujitsu) wrote: > > CFbot[1] complained about one query result's order in the tap-test, so I am > attaching a V7 patch set which fixed this. There are no changes in 0001. > > [1] https://cirrus-ci.com/task/6375962162495488 Thanks. Here are some

Re: Synchronizing slots from primary to standby

2024-04-02 Thread Bertrand Drouvot
Hi, On Tue, Apr 02, 2024 at 02:19:30PM +0530, Amit Kapila wrote: > On Tue, Apr 2, 2024 at 1:54 PM Bertrand Drouvot > wrote: > > What about adding a "wait" injection point in LogStandbySnapshot() to > > prevent > > checkpointer/bgwriter to log a standby snapshot? Something among those > >

Re: Synchronizing slots from primary to standby

2024-04-02 Thread Amit Kapila
On Tue, Apr 2, 2024 at 1:54 PM Bertrand Drouvot wrote: > > On Tue, Apr 02, 2024 at 07:20:46AM +, Zhijie Hou (Fujitsu) wrote: > > I added one test in 040_standby_failover_slots_sync.pl in 0002 patch, which > > can > > reproduce the data loss issue consistently on my machine. > > Thanks! > > >

RE: Synchronizing slots from primary to standby

2024-04-02 Thread Zhijie Hou (Fujitsu)
On Tuesday, April 2, 2024 3:21 PM Zhijie Hou (Fujitsu) wrote: > On Tuesday, April 2, 2024 8:35 AM Zhijie Hou (Fujitsu) > wrote: > > > > On Monday, April 1, 2024 7:30 PM Amit Kapila > > wrote: > > > > > > On Mon, Apr 1, 2024 at 6:26 AM Zhijie Hou (Fujitsu) > > > > > > wrote: > > > > > > > > On

Re: Synchronizing slots from primary to standby

2024-04-02 Thread Bertrand Drouvot
Hi, On Tue, Apr 02, 2024 at 07:20:46AM +, Zhijie Hou (Fujitsu) wrote: > I added one test in 040_standby_failover_slots_sync.pl in 0002 patch, which > can > reproduce the data loss issue consistently on my machine. Thanks! > It may not reproduce > in some rare cases if concurrent

RE: Synchronizing slots from primary to standby

2024-04-02 Thread Zhijie Hou (Fujitsu)
On Tuesday, April 2, 2024 8:35 AM Zhijie Hou (Fujitsu) wrote: > > On Monday, April 1, 2024 7:30 PM Amit Kapila > wrote: > > > > On Mon, Apr 1, 2024 at 6:26 AM Zhijie Hou (Fujitsu) > > > > wrote: > > > > > > On Friday, March 29, 2024 2:50 PM Amit Kapila > > > > > wrote: > > > > > > > > > > >

Re: Synchronizing slots from primary to standby

2024-04-01 Thread Bertrand Drouvot
Hi, On Tue, Apr 02, 2024 at 04:24:49AM +, Zhijie Hou (Fujitsu) wrote: > On Monday, April 1, 2024 9:28 PM Bertrand Drouvot > wrote: > > > > On Mon, Apr 01, 2024 at 05:04:53PM +0530, Amit Kapila wrote: > > > On Mon, Apr 1, 2024 at 2:51 PM Bertrand Drouvot > > > > > > > > > 2 === > > > > > >

Re: Synchronizing slots from primary to standby

2024-04-01 Thread shveta malik
On Mon, Apr 1, 2024 at 5:05 PM Amit Kapila wrote: > > > 2 === > > > > + { > > + if (SnapBuildSnapshotExists(remote_slot->restart_lsn)) > > + { > > > > That could call SnapBuildSnapshotExists() multiple times for the same > > "restart_lsn" (for example in case of

RE: Synchronizing slots from primary to standby

2024-04-01 Thread Zhijie Hou (Fujitsu)
On Tuesday, April 2, 2024 8:43 AM Bharath Rupireddy wrote: > > On Mon, Apr 1, 2024 at 11:36 AM Zhijie Hou (Fujitsu) > wrote: > > > > Attach the V4 patch which includes the optimization to skip the > > decoding if the snapshot at the syncing restart_lsn is already > > serialized. It can avoid

Re: Synchronizing slots from primary to standby

2024-04-01 Thread Amit Kapila
On Mon, Apr 1, 2024 at 6:58 PM Bertrand Drouvot wrote: > > On Mon, Apr 01, 2024 at 05:04:53PM +0530, Amit Kapila wrote: > > On Mon, Apr 1, 2024 at 2:51 PM Bertrand Drouvot > > wrote: > > > Then there is no need to call WaitForStandbyConfirmation() as it could go > > > until > > > the

Re: Synchronizing slots from primary to standby

2024-04-01 Thread Bharath Rupireddy
On Mon, Apr 1, 2024 at 11:36 AM Zhijie Hou (Fujitsu) wrote: > > Attach the V4 patch which includes the optimization to skip the decoding if > the snapshot at the syncing restart_lsn is already serialized. It can avoid > most > of the duplicate decoding in my test, and I am doing some more tests

RE: Synchronizing slots from primary to standby

2024-04-01 Thread Zhijie Hou (Fujitsu)
On Monday, April 1, 2024 7:30 PM Amit Kapila wrote: > > On Mon, Apr 1, 2024 at 6:26 AM Zhijie Hou (Fujitsu) > wrote: > > > > On Friday, March 29, 2024 2:50 PM Amit Kapila > wrote: > > > > > > > > > > > > > > 2. > > > +extern XLogRecPtr pg_logical_replication_slot_advance(XLogRecPtr > moveto, >

Re: Synchronizing slots from primary to standby

2024-04-01 Thread Bertrand Drouvot
Hi, On Mon, Apr 01, 2024 at 05:04:53PM +0530, Amit Kapila wrote: > On Mon, Apr 1, 2024 at 2:51 PM Bertrand Drouvot > wrote: > > Then there is no need to call WaitForStandbyConfirmation() as it could go > > until > > the RecoveryInProgress() in StandbySlotsHaveCaughtup() for nothing (as we > >

Re: Synchronizing slots from primary to standby

2024-04-01 Thread Bharath Rupireddy
On Mon, Apr 1, 2024 at 10:40 AM Amit Kapila wrote: > > After this step and before the next, did you ensure that the slot sync > has synced the latest confirmed_flush/restart LSNs? You can query: > "select slot_name,restart_lsn, confirmed_flush_lsn from > pg_replication_slots;" to ensure the same

Re: Synchronizing slots from primary to standby

2024-04-01 Thread Amit Kapila
On Mon, Apr 1, 2024 at 2:51 PM Bertrand Drouvot wrote: > > Hi, > > On Mon, Apr 01, 2024 at 06:05:34AM +, Zhijie Hou (Fujitsu) wrote: > > On Monday, April 1, 2024 8:56 AM Zhijie Hou (Fujitsu) > > wrote: > > Attach the V4 patch which includes the optimization to skip the decoding if > > the

Re: Synchronizing slots from primary to standby

2024-04-01 Thread Amit Kapila
On Mon, Apr 1, 2024 at 6:26 AM Zhijie Hou (Fujitsu) wrote: > > On Friday, March 29, 2024 2:50 PM Amit Kapila wrote: > > > > > > > > > 2. > > +extern XLogRecPtr pg_logical_replication_slot_advance(XLogRecPtr moveto, > > + bool *found_consistent_point); > > + > > > > This API looks a bit awkward

Re: Synchronizing slots from primary to standby

2024-04-01 Thread Nisha Moond
Did performance test on optimization patch (v2-0001-optimize-the-slot-advancement.patch). Please find the results: Setup: - One primary node with 100 failover-enabled logical slots - 20 DBs, each having 5 failover-enabled logical replication slots - One physical standby node with

Re: Synchronizing slots from primary to standby

2024-04-01 Thread Bertrand Drouvot
Hi, On Mon, Apr 01, 2024 at 06:05:34AM +, Zhijie Hou (Fujitsu) wrote: > On Monday, April 1, 2024 8:56 AM Zhijie Hou (Fujitsu) > wrote: > Attach the V4 patch which includes the optimization to skip the decoding if > the snapshot at the syncing restart_lsn is already serialized. It can avoid

Re: Synchronizing slots from primary to standby

2024-04-01 Thread shveta malik
On Mon, Apr 1, 2024 at 10:40 AM Amit Kapila wrote: > > On Mon, Apr 1, 2024 at 10:01 AM Bharath Rupireddy > wrote: > > > > On Thu, Mar 28, 2024 at 10:08 AM Zhijie Hou (Fujitsu) > > wrote: > > > > > > [2] The steps to reproduce the data miss issue on a primary->standby > > > setup: > > > > I'm

RE: Synchronizing slots from primary to standby

2024-04-01 Thread Zhijie Hou (Fujitsu)
On Monday, April 1, 2024 8:56 AM Zhijie Hou (Fujitsu) wrote: > > On Friday, March 29, 2024 2:50 PM Amit Kapila > wrote: > > > > On Fri, Mar 29, 2024 at 6:36 AM Zhijie Hou (Fujitsu) > > > > wrote: > > > > > > > > > Attach a new version patch which fixed an un-initialized variable > > > issue

Re: Synchronizing slots from primary to standby

2024-03-31 Thread Amit Kapila
On Mon, Apr 1, 2024 at 10:01 AM Bharath Rupireddy wrote: > > On Thu, Mar 28, 2024 at 10:08 AM Zhijie Hou (Fujitsu) > wrote: > > > > [2] The steps to reproduce the data miss issue on a primary->standby setup: > > I'm trying to reproduce the problem with [1], but I can see the > changes after the

Re: Synchronizing slots from primary to standby

2024-03-31 Thread Bharath Rupireddy
On Thu, Mar 28, 2024 at 10:08 AM Zhijie Hou (Fujitsu) wrote: > > [2] The steps to reproduce the data miss issue on a primary->standby setup: I'm trying to reproduce the problem with [1], but I can see the changes after the standby is promoted. Am I missing anything here?

RE: Synchronizing slots from primary to standby

2024-03-31 Thread Zhijie Hou (Fujitsu)
On Friday, March 29, 2024 2:50 PM Amit Kapila wrote: > > On Fri, Mar 29, 2024 at 6:36 AM Zhijie Hou (Fujitsu) > wrote: > > > > > > Attach a new version patch which fixed an un-initialized variable > > issue and added some comments. > > > > The other approach to fix this issue could be that the

Re: Synchronizing slots from primary to standby

2024-03-29 Thread Bertrand Drouvot
Hi, On Fri, Mar 29, 2024 at 02:35:22PM +0530, Amit Kapila wrote: > On Fri, Mar 29, 2024 at 1:08 PM Bertrand Drouvot > wrote: > > > > On Fri, Mar 29, 2024 at 07:23:11AM +, Zhijie Hou (Fujitsu) wrote: > > > On Friday, March 29, 2024 2:48 PM Bertrand Drouvot > > > wrote: > > > > > > > > Hi, >

Re: Synchronizing slots from primary to standby

2024-03-29 Thread Amit Kapila
On Fri, Mar 29, 2024 at 1:08 PM Bertrand Drouvot wrote: > > On Fri, Mar 29, 2024 at 07:23:11AM +, Zhijie Hou (Fujitsu) wrote: > > On Friday, March 29, 2024 2:48 PM Bertrand Drouvot > > wrote: > > > > > > Hi, > > > > > > On Fri, Mar 29, 2024 at 01:06:15AM +, Zhijie Hou (Fujitsu) wrote: >

Re: Synchronizing slots from primary to standby

2024-03-29 Thread Amit Kapila
On Fri, Mar 29, 2024 at 9:34 AM Hayato Kuroda (Fujitsu) wrote: > > Thanks for updating the patch! Here is a comment for it. > > ``` > +/* > + * By advancing the restart_lsn, confirmed_lsn, and xmin using > + * fast-forward logical decoding, we can verify whether a

Re: Synchronizing slots from primary to standby

2024-03-29 Thread Bertrand Drouvot
Hi, On Fri, Mar 29, 2024 at 07:23:11AM +, Zhijie Hou (Fujitsu) wrote: > On Friday, March 29, 2024 2:48 PM Bertrand Drouvot > wrote: > > > > Hi, > > > > On Fri, Mar 29, 2024 at 01:06:15AM +, Zhijie Hou (Fujitsu) wrote: > > > Attach a new version patch which fixed an un-initialized

RE: Synchronizing slots from primary to standby

2024-03-29 Thread Zhijie Hou (Fujitsu)
On Friday, March 29, 2024 2:48 PM Bertrand Drouvot wrote: > > Hi, > > On Fri, Mar 29, 2024 at 01:06:15AM +, Zhijie Hou (Fujitsu) wrote: > > Attach a new version patch which fixed an un-initialized variable > > issue and added some comments. Also, temporarily enable DEBUG2 for the > > 040

Re: Synchronizing slots from primary to standby

2024-03-29 Thread Amit Kapila
On Fri, Mar 29, 2024 at 6:36 AM Zhijie Hou (Fujitsu) wrote: > > > Attach a new version patch which fixed an un-initialized variable issue and > added some comments. > The other approach to fix this issue could be that the slotsync worker get the serialized snapshot using pg_read_binary_file()

Re: Synchronizing slots from primary to standby

2024-03-29 Thread Bertrand Drouvot
Hi, On Fri, Mar 29, 2024 at 01:06:15AM +, Zhijie Hou (Fujitsu) wrote: > Attach a new version patch which fixed an un-initialized variable issue and > added some comments. Also, temporarily enable DEBUG2 for the 040 tap-test so > that > we can analyze the possible CFbot failures easily. >

RE: Synchronizing slots from primary to standby

2024-03-28 Thread Hayato Kuroda (Fujitsu)
Dear Hou, Thanks for updating the patch! Here is a comment for it. ``` +/* + * By advancing the restart_lsn, confirmed_lsn, and xmin using + * fast-forward logical decoding, we can verify whether a consistent + * snapshot can be built. This process also involves

Re: Synchronizing slots from primary to standby

2024-03-28 Thread shveta malik
On Fri, Mar 29, 2024 at 6:36 AM Zhijie Hou (Fujitsu) wrote: > > Attach a new version patch which fixed an un-initialized variable issue and > added some comments. Also, temporarily enable DEBUG2 for the 040 tap-test so > that > we can analyze the possible CFbot failures easily. As suggested by

RE: Synchronizing slots from primary to standby

2024-03-28 Thread Zhijie Hou (Fujitsu)
On Thursday, March 28, 2024 10:02 PM Zhijie Hou (Fujitsu) wrote: > > On Thursday, March 28, 2024 7:32 PM Amit Kapila > wrote: > > > > On Thu, Mar 28, 2024 at 10:08 AM Zhijie Hou (Fujitsu) > > > > wrote: > > > > > > When analyzing one BF error[1], we find an issue of slotsync: Since > > > we

RE: Synchronizing slots from primary to standby

2024-03-28 Thread Zhijie Hou (Fujitsu)
On Thursday, March 28, 2024 7:32 PM Amit Kapila wrote: > > On Thu, Mar 28, 2024 at 10:08 AM Zhijie Hou (Fujitsu) > wrote: > > > > When analyzing one BF error[1], we find an issue of slotsync: Since we > > don't perform logical decoding for the synced slots when syncing the > > lsn/xmin of slot,

Re: Synchronizing slots from primary to standby

2024-03-28 Thread Bertrand Drouvot
Hi, On Thu, Mar 28, 2024 at 05:05:35PM +0530, Amit Kapila wrote: > On Thu, Mar 28, 2024 at 3:34 PM Bertrand Drouvot > wrote: > > > > On Thu, Mar 28, 2024 at 04:38:19AM +, Zhijie Hou (Fujitsu) wrote: > > > > > To fix this, we could use the fast forward logical decoding to advance > > > the

Re: Synchronizing slots from primary to standby

2024-03-28 Thread Amit Kapila
On Thu, Mar 28, 2024 at 3:34 PM Bertrand Drouvot wrote: > > On Thu, Mar 28, 2024 at 04:38:19AM +, Zhijie Hou (Fujitsu) wrote: > > > To fix this, we could use the fast forward logical decoding to advance the > > synced > > slot's lsn/xmin when syncing these values instead of directly updating

Re: Synchronizing slots from primary to standby

2024-03-28 Thread Amit Kapila
On Thu, Mar 28, 2024 at 10:08 AM Zhijie Hou (Fujitsu) wrote: > > When analyzing one BF error[1], we find an issue of slotsync: Since we don't > perform logical decoding for the synced slots when syncing the lsn/xmin of > slot, no logical snapshots will be serialized to disk. So, when user starts

Re: Synchronizing slots from primary to standby

2024-03-28 Thread Bertrand Drouvot
Hi, On Thu, Mar 28, 2024 at 04:38:19AM +, Zhijie Hou (Fujitsu) wrote: > Hi, > > When analyzing one BF error[1], we find an issue of slotsync: Since we don't > perform logical decoding for the synced slots when syncing the lsn/xmin of > slot, no logical snapshots will be serialized to disk.

RE: Synchronizing slots from primary to standby

2024-03-27 Thread Zhijie Hou (Fujitsu)
Hi, When analyzing one BF error[1], we find an issue of slotsync: Since we don't perform logical decoding for the synced slots when syncing the lsn/xmin of slot, no logical snapshots will be serialized to disk. So, when user starts to use these synced slots after promotion, it needs to re-build

Re: Synchronizing slots from primary to standby

2024-03-15 Thread Bertrand Drouvot
Hi, On Thu, Mar 14, 2024 at 02:22:44AM +, Zhijie Hou (Fujitsu) wrote: > Hi, > > Since the standby_slot_names patch has been committed, I am attaching the last > doc patch for review. > Thanks! 1 === + continue subscribing to publications now on the new primary server without + any

RE: Synchronizing slots from primary to standby

2024-03-13 Thread Zhijie Hou (Fujitsu)
Hi, Since the standby_slot_names patch has been committed, I am attaching the last doc patch for review. Best Regards, Hou zj v109-0001-Document-the-steps-to-check-if-the-standby-is-r.patch Description: v109-0001-Document-the-steps-to-check-if-the-standby-is-r.patch

Re: Synchronizing slots from primary to standby

2024-03-07 Thread shveta malik
On Fri, Mar 8, 2024 at 9:56 AM Ajin Cherian wrote: > >> Pushed with minor modifications. I'll keep an eye on BF. >> >> BTW, one thing that we should try to evaluate a bit more is the >> traversal of slots in StandbySlotsHaveCaughtup() where we verify if >> all the slots mentioned in

Re: Synchronizing slots from primary to standby

2024-03-07 Thread Ajin Cherian
On Fri, Mar 8, 2024 at 2:33 PM Amit Kapila wrote: > On Thu, Mar 7, 2024 at 12:00 PM Zhijie Hou (Fujitsu) > wrote: > > > > > > Attach the V108 patch set which addressed above and Peter's comments. > > I also removed the check for "*" in guc check hook. > > > > > Pushed with minor modifications.

Re: Synchronizing slots from primary to standby

2024-03-07 Thread Amit Kapila
On Thu, Mar 7, 2024 at 12:00 PM Zhijie Hou (Fujitsu) wrote: > > > Attach the V108 patch set which addressed above and Peter's comments. > I also removed the check for "*" in guc check hook. > Pushed with minor modifications. I'll keep an eye on BF. BTW, one thing that we should try to evaluate

RE: Synchronizing slots from primary to standby

2024-03-06 Thread Zhijie Hou (Fujitsu)
On Thursday, March 7, 2024 12:46 PM Amit Kapila wrote: > > On Thu, Mar 7, 2024 at 7:35 AM Peter Smith > wrote: > > > > Here are some review comments for v107-0001 > > > > == > > src/backend/replication/slot.c > > > > 1. > > +/* > > + * Struct for the configuration of standby_slot_names. > >

RE: Synchronizing slots from primary to standby

2024-03-06 Thread Zhijie Hou (Fujitsu)
On Thursday, March 7, 2024 10:05 AM Peter Smith wrote: > > Here are some review comments for v107-0001 Thanks for the comments. > > == > src/backend/replication/slot.c > > 1. > +/* > + * Struct for the configuration of standby_slot_names. > + * > + * Note: this must be a flat

Re: Synchronizing slots from primary to standby

2024-03-06 Thread Amit Kapila
On Thu, Mar 7, 2024 at 8:37 AM shveta malik wrote: > I thought about whether we can make standby_slot_names as USERSET instead of SIGHUP and it doesn't sound like a good idea as that can lead to inconsistent standby replicas even after configuring the correct value of standby_slot_names. One can

Re: Synchronizing slots from primary to standby

2024-03-06 Thread Amit Kapila
On Thu, Mar 7, 2024 at 7:35 AM Peter Smith wrote: > > Here are some review comments for v107-0001 > > == > src/backend/replication/slot.c > > 1. > +/* > + * Struct for the configuration of standby_slot_names. > + * > + * Note: this must be a flat representation that can be held in a single >

Re: Synchronizing slots from primary to standby

2024-03-06 Thread Masahiko Sawada
On Wed, Mar 6, 2024 at 5:53 PM Amit Kapila wrote: > > On Wed, Mar 6, 2024 at 12:07 PM Masahiko Sawada wrote: > > > > On Fri, Mar 1, 2024 at 3:22 PM Peter Smith wrote: > > > > > > On Fri, Mar 1, 2024 at 5:11 PM Masahiko Sawada > > > wrote: > > > > > > > ... > > > > +/* > > > > +

Re: Synchronizing slots from primary to standby

2024-03-06 Thread shveta malik
On Wed, Mar 6, 2024 at 6:54 PM Zhijie Hou (Fujitsu) wrote: > > On Wednesday, March 6, 2024 9:13 PM Zhijie Hou (Fujitsu) > wrote: > > > > On Wednesday, March 6, 2024 11:04 AM Zhijie Hou (Fujitsu) > > wrote: > > > > > > On Wednesday, March 6, 2024 9:30 AM Masahiko Sawada > > > wrote: > > > > >

Re: Synchronizing slots from primary to standby

2024-03-06 Thread Peter Smith
Here are some review comments for v107-0001 == src/backend/replication/slot.c 1. +/* + * Struct for the configuration of standby_slot_names. + * + * Note: this must be a flat representation that can be held in a single chunk + * of guc_malloc'd memory, so that it can be stored as the "extra"

RE: Synchronizing slots from primary to standby

2024-03-06 Thread Zhijie Hou (Fujitsu)
On Wednesday, March 6, 2024 9:13 PM Zhijie Hou (Fujitsu) wrote: > > On Wednesday, March 6, 2024 11:04 AM Zhijie Hou (Fujitsu) > wrote: > > > > On Wednesday, March 6, 2024 9:30 AM Masahiko Sawada > > wrote: > > > > Hi, > > > > > On Fri, Mar 1, 2024 at 4:21 PM Zhijie Hou (Fujitsu) > > > > > >

RE: Synchronizing slots from primary to standby

2024-03-06 Thread Zhijie Hou (Fujitsu)
On Wednesday, March 6, 2024 11:04 AM Zhijie Hou (Fujitsu) wrote: > > On Wednesday, March 6, 2024 9:30 AM Masahiko Sawada > wrote: > > Hi, > > > On Fri, Mar 1, 2024 at 4:21 PM Zhijie Hou (Fujitsu) > > > > wrote: > > > > > > On Friday, March 1, 2024 2:11 PM Masahiko Sawada > > wrote: > > > >

Re: Synchronizing slots from primary to standby

2024-03-06 Thread Amit Kapila
On Wed, Mar 6, 2024 at 12:07 PM Masahiko Sawada wrote: > > On Fri, Mar 1, 2024 at 3:22 PM Peter Smith wrote: > > > > On Fri, Mar 1, 2024 at 5:11 PM Masahiko Sawada > > wrote: > > > > > ... > > > +/* > > > + * "*" is not accepted as in that case primary will not be able > > >

Re: Synchronizing slots from primary to standby

2024-03-05 Thread Masahiko Sawada
On Fri, Mar 1, 2024 at 3:22 PM Peter Smith wrote: > > On Fri, Mar 1, 2024 at 5:11 PM Masahiko Sawada wrote: > > > ... > > +/* > > + * "*" is not accepted as in that case primary will not be able to > > know > > + * for which all standbys to wait for. Even if we have

  1   2   3   4   5   6   7   8   9   >