Here are some review comments for the docs patch v3-0001.
==
Commit message
1.
This patch adds detailed documentation for the slot sync feature
including examples to guide users on how to verify that all slots have
been successfully synchronized to the standby server and how to
confirm
Hi,
On Mon, Apr 29, 2024 at 11:58:09AM +, Zhijie Hou (Fujitsu) wrote:
> On Monday, April 29, 2024 5:11 PM shveta malik wrote:
> >
> > On Mon, Apr 29, 2024 at 11:38 AM shveta malik
> > wrote:
> > >
> > > On Mon, Apr 29, 2024 at 10:57 AM Zhijie Hou (Fujitsu)
> > > wrote:
> > > >
> > > > On
On Mon, Apr 29, 2024 at 5:28 PM Zhijie Hou (Fujitsu)
wrote:
>
> On Monday, April 29, 2024 5:11 PM shveta malik wrote:
> >
> > On Mon, Apr 29, 2024 at 11:38 AM shveta malik
> > wrote:
> > >
> > > On Mon, Apr 29, 2024 at 10:57 AM Zhijie Hou (Fujitsu)
> > > wrote:
> > > >
> > > > On Friday, March
On Monday, April 29, 2024 5:11 PM shveta malik wrote:
>
> On Mon, Apr 29, 2024 at 11:38 AM shveta malik
> wrote:
> >
> > On Mon, Apr 29, 2024 at 10:57 AM Zhijie Hou (Fujitsu)
> > wrote:
> > >
> > > On Friday, March 15, 2024 10:45 PM Bertrand Drouvot
> wrote:
> > > >
> > > > Hi,
> > > >
> > >
On Mon, Apr 29, 2024 at 11:38 AM shveta malik wrote:
>
> On Mon, Apr 29, 2024 at 10:57 AM Zhijie Hou (Fujitsu)
> wrote:
> >
> > On Friday, March 15, 2024 10:45 PM Bertrand Drouvot
> > wrote:
> > >
> > > Hi,
> > >
> > > On Thu, Mar 14, 2024 at 02:22:44AM +, Zhijie Hou (Fujitsu) wrote:
> > >
On Mon, Apr 29, 2024 at 11:38 AM shveta malik wrote:
>
> On Mon, Apr 29, 2024 at 10:57 AM Zhijie Hou (Fujitsu)
> wrote:
> >
> > On Friday, March 15, 2024 10:45 PM Bertrand Drouvot
> > wrote:
> > >
> > > Hi,
> > >
> > > On Thu, Mar 14, 2024 at 02:22:44AM +, Zhijie Hou (Fujitsu) wrote:
> > >
On Mon, Apr 29, 2024 at 10:57 AM Zhijie Hou (Fujitsu)
wrote:
>
> On Friday, March 15, 2024 10:45 PM Bertrand Drouvot
> wrote:
> >
> > Hi,
> >
> > On Thu, Mar 14, 2024 at 02:22:44AM +, Zhijie Hou (Fujitsu) wrote:
> > > Hi,
> > >
> > > Since the standby_slot_names patch has been committed, I
On Friday, March 15, 2024 10:45 PM Bertrand Drouvot
wrote:
>
> Hi,
>
> On Thu, Mar 14, 2024 at 02:22:44AM +, Zhijie Hou (Fujitsu) wrote:
> > Hi,
> >
> > Since the standby_slot_names patch has been committed, I am attaching
> > the last doc patch for review.
> >
>
> Thanks!
>
> 1 ===
>
>
On Friday, April 12, 2024 11:31 AM Amit Kapila wrote:
>
> On Thu, Apr 11, 2024 at 5:04 PM Zhijie Hou (Fujitsu)
> wrote:
> >
> > On Thursday, April 11, 2024 12:11 PM Amit Kapila
> wrote:
> >
> > >
> > > 2.
> > > - if (remote_slot->restart_lsn < slot->data.restart_lsn)
> > > + if
On Thu, Apr 11, 2024 at 5:04 PM Zhijie Hou (Fujitsu)
wrote:
>
> On Thursday, April 11, 2024 12:11 PM Amit Kapila
> wrote:
>
> >
> > 2.
> > - if (remote_slot->restart_lsn < slot->data.restart_lsn)
> > + if (remote_slot->confirmed_lsn < slot->data.confirmed_flush)
> > elog(ERROR,
> > "cannot
On Thursday, April 11, 2024 12:11 PM Amit Kapila
wrote:
>
> On Wed, Apr 10, 2024 at 5:28 PM Zhijie Hou (Fujitsu)
> wrote:
> >
> > On Thursday, April 4, 2024 5:37 PM Amit Kapila
> wrote:
> > >
> > > BTW, while thinking on this one, I
> > > noticed that in the function
On Wed, Apr 10, 2024 at 5:28 PM Zhijie Hou (Fujitsu)
wrote:
>
> On Thursday, April 4, 2024 5:37 PM Amit Kapila
> wrote:
> >
> > BTW, while thinking on this one, I
> > noticed that in the function LogicalConfirmReceivedLocation(), we first
> > update
> > the disk copy, see comment [1] and then
On Thursday, April 4, 2024 5:37 PM Amit Kapila wrote:
>
> BTW, while thinking on this one, I
> noticed that in the function LogicalConfirmReceivedLocation(), we first update
> the disk copy, see comment [1] and then in-memory whereas the same is not
> true in
> update_local_synced_slot() for
On Thursday, April 4, 2024 4:25 PM Masahiko Sawada
wrote:
Hi,
> On Wed, Apr 3, 2024 at 7:06 PM Amit Kapila
> wrote:
> >
> > On Wed, Apr 3, 2024 at 11:13 AM Amit Kapila
> wrote:
> > >
> > > On Wed, Apr 3, 2024 at 9:36 AM Bharath Rupireddy
> > > wrote:
> > >
> > > > I quickly looked at v8,
On Mon, Apr 8, 2024 at 7:01 PM Zhijie Hou (Fujitsu)
wrote:
>
> Thanks for pushing.
>
> I checked the BF status, and noticed one BF failure, which I think is related
> to
> a miss in the test code.
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=adder=2024-04-08%2012%3A04%3A27
>
> From
On Mon, Apr 8, 2024 at 9:49 PM Andres Freund wrote:
>
> On 2024-04-08 16:01:41 +0530, Amit Kapila wrote:
> > Pushed.
>
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=adder=2024-04-08%2012%3A04%3A27
>
> This unfortunately is a commit after
>
Right, and thanks for the report. Hou-San
Hi,
On 2024-04-08 16:01:41 +0530, Amit Kapila wrote:
> Pushed.
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=adder=2024-04-08%2012%3A04%3A27
This unfortunately is a commit after
commit 6f3d8d5e7cc
Author: Amit Kapila
Date: 2024-04-08 13:21:55 +0530
Fix the intermittent
On Monday, April 8, 2024 6:32 PM Amit Kapila wrote:
>
> On Mon, Apr 8, 2024 at 12:19 PM Zhijie Hou (Fujitsu)
> wrote:
> >
> > On Saturday, April 6, 2024 12:43 PM Amit Kapila
> wrote:
> > > On Fri, Apr 5, 2024 at 8:05 PM Bertrand Drouvot
> > > wrote:
> > >
> > > Yeah, that could be the first
On Mon, Apr 8, 2024 at 12:19 PM Zhijie Hou (Fujitsu)
wrote:
>
> On Saturday, April 6, 2024 12:43 PM Amit Kapila
> wrote:
> > On Fri, Apr 5, 2024 at 8:05 PM Bertrand Drouvot
> > wrote:
> >
> > Yeah, that could be the first step. We can probably add an injection point
> > to
> > control the
On Saturday, April 6, 2024 12:43 PM Amit Kapila wrote:
> On Fri, Apr 5, 2024 at 8:05 PM Bertrand Drouvot
> wrote:
> >
> > On Fri, Apr 05, 2024 at 06:23:10PM +0530, Amit Kapila wrote:
> > > On Fri, Apr 5, 2024 at 5:17 PM Amit Kapila
> wrote:
> > > Thinking more on this, it doesn't seem related
On Sun, Apr 7, 2024 at 3:06 AM Andres Freund wrote:
>
> On 2024-04-06 10:58:32 +0530, Amit Kapila wrote:
> > On Sat, Apr 6, 2024 at 10:13 AM Amit Kapila wrote:
> > >
> >
> > There are still a few pending issues to be fixed in this feature but
> > otherwise, we have committed all the main
Hi,
On 2024-04-06 10:58:32 +0530, Amit Kapila wrote:
> On Sat, Apr 6, 2024 at 10:13 AM Amit Kapila wrote:
> >
>
> There are still a few pending issues to be fixed in this feature but
> otherwise, we have committed all the main patches, so I marked the CF
> entry corresponding to this work as
Hi,
On Sat, Apr 06, 2024 at 10:13:00AM +0530, Amit Kapila wrote:
> On Fri, Apr 5, 2024 at 8:05 PM Bertrand Drouvot
> wrote:
>
> I think the new LSN can be visible only when the corresponding WAL is
> written by XLogWrite(). I don't know what in XLogSetAsyncXactLSN() can
> make it visible. In
On Sat, Apr 6, 2024 at 10:13 AM Amit Kapila wrote:
>
There are still a few pending issues to be fixed in this feature but
otherwise, we have committed all the main patches, so I marked the CF
entry corresponding to this work as committed.
--
With Regards,
Amit Kapila.
On Fri, Apr 5, 2024 at 8:05 PM Bertrand Drouvot
wrote:
>
> On Fri, Apr 05, 2024 at 06:23:10PM +0530, Amit Kapila wrote:
> > On Fri, Apr 5, 2024 at 5:17 PM Amit Kapila wrote:
> > Thinking more on this, it doesn't seem related to
> > c9920a9068eac2e6c8fb34988d18c0b42b9bf811 as that commit doesn't
Hi,
On Fri, Apr 05, 2024 at 02:35:42PM +, Bertrand Drouvot wrote:
> I think that maybe as a first step we should move the "elog(DEBUG2," message
> as
> proposed above to help debugging (that could help to confirm the above
> theory).
If you agree and think that makes sense, pleae find
Hi,
On Fri, Apr 05, 2024 at 06:23:10PM +0530, Amit Kapila wrote:
> On Fri, Apr 5, 2024 at 5:17 PM Amit Kapila wrote:
> Thinking more on this, it doesn't seem related to
> c9920a9068eac2e6c8fb34988d18c0b42b9bf811 as that commit doesn't change
> any locking or something like that which impacts
On Fri, Apr 5, 2024 at 5:17 PM Amit Kapila wrote:
>
> On Thu, Apr 4, 2024 at 2:59 PM shveta malik wrote:
> >
> > There is an intermittent BF failure observed at [1] after this commit
> > (2ec005b).
> >
>
> Thanks for analyzing and providing the patch. I'll look into it. There
> is another BF
On Thu, Apr 4, 2024 at 2:59 PM shveta malik wrote:
>
> There is an intermittent BF failure observed at [1] after this commit
> (2ec005b).
>
Thanks for analyzing and providing the patch. I'll look into it. There
is another BF failure [1] which I have analyzed. The main reason for
failure is the
On Fri, Apr 5, 2024 at 4:31 PM Bertrand Drouvot
wrote:
>
> BTW, I just realized that the LSN I used in my example in the
> LSN_FORMAT_ARGS()
> are not the right ones.
Noted. Thanks.
Please find v3 with the comments addressed.
thanks
Shveta
Hi,
On Fri, Apr 05, 2024 at 04:09:01PM +0530, shveta malik wrote:
> On Fri, Apr 5, 2024 at 10:09 AM Bertrand Drouvot
> wrote:
> >
> > What about something like?
> >
> > ereport(LOG,
> > errmsg("synchronized confirmed_flush_lsn for slot \"%s\" differs
> > from remote slot",
> >
On Fri, Apr 5, 2024 at 10:09 AM Bertrand Drouvot
wrote:
>
> What about something like?
>
> ereport(LOG,
> errmsg("synchronized confirmed_flush_lsn for slot \"%s\" differs from
> remote slot",
> remote_slot->name),
> errdetail("Remote slot has LSN %X/%X but local slot has
Hi,
On Fri, Apr 05, 2024 at 09:43:35AM +0530, shveta malik wrote:
> On Fri, Apr 5, 2024 at 9:22 AM Bertrand Drouvot
> wrote:
> >
> > Hi,
> >
> > On Thu, Apr 04, 2024 at 05:31:45PM +0530, shveta malik wrote:
> > > On Thu, Apr 4, 2024 at 2:59 PM shveta malik
> > > wrote:
> > 2 ===
> >
> > +
On Fri, Apr 5, 2024 at 9:22 AM Bertrand Drouvot
wrote:
>
> Hi,
>
> On Thu, Apr 04, 2024 at 05:31:45PM +0530, shveta malik wrote:
> > On Thu, Apr 4, 2024 at 2:59 PM shveta malik wrote:
> > >
> > >
> > > Prior to commit 2ec005b, this check was okay, as we did not expect
> > > restart_lsn of the
Hi,
On Thu, Apr 04, 2024 at 05:31:45PM +0530, shveta malik wrote:
> On Thu, Apr 4, 2024 at 2:59 PM shveta malik wrote:
> >
> >
> > Prior to commit 2ec005b, this check was okay, as we did not expect
> > restart_lsn of the synced slot to be ahead of remote since we were
> > directly copying the
On Thu, Apr 4, 2024 at 2:59 PM shveta malik wrote:
>
>
> Prior to commit 2ec005b, this check was okay, as we did not expect
> restart_lsn of the synced slot to be ahead of remote since we were
> directly copying the lsns. But now when we use 'advance' to do logical
> decoding on standby, there is
On Thu, Apr 4, 2024 at 1:55 PM Masahiko Sawada wrote:
>
> While testing this change, I realized that it could happen that the
> server logs are flooded with the following logical decoding logs that
> are written every 200 ms:
>
> 2024-04-04 16:15:19.270 JST [3838739] LOG: starting logical
On Wed, Apr 3, 2024 at 3:36 PM Amit Kapila wrote:
>
> On Wed, Apr 3, 2024 at 11:13 AM Amit Kapila wrote:
> >
> > On Wed, Apr 3, 2024 at 9:36 AM Bharath Rupireddy
> > wrote:
> >
> > > I quickly looked at v8, and have a nit, rest all looks good.
> > >
> > > +if (DecodingContextReady(ctx)
On Wed, Apr 3, 2024 at 7:06 PM Amit Kapila wrote:
>
> On Wed, Apr 3, 2024 at 11:13 AM Amit Kapila wrote:
> >
> > On Wed, Apr 3, 2024 at 9:36 AM Bharath Rupireddy
> > wrote:
> >
> > > I quickly looked at v8, and have a nit, rest all looks good.
> > >
> > > +if (DecodingContextReady(ctx)
On Wed, Apr 3, 2024 at 11:13 AM Amit Kapila wrote:
>
> On Wed, Apr 3, 2024 at 9:36 AM Bharath Rupireddy
> wrote:
>
> > I quickly looked at v8, and have a nit, rest all looks good.
> >
> > +if (DecodingContextReady(ctx) && found_consistent_snapshot)
> > +
On Wed, Apr 3, 2024 at 9:36 AM Bharath Rupireddy
wrote:
>
> On Wed, Apr 3, 2024 at 9:04 AM Amit Kapila wrote:
> >
> > > I'd just rename LogicalSlotAdvanceAndCheckSnapState(XLogRecPtr
> > > moveto, bool *found_consistent_snapshot) to
> > > pg_logical_replication_slot_advance(XLogRecPtr moveto,
On Wed, Apr 3, 2024 at 9:04 AM Amit Kapila wrote:
>
> > I'd just rename LogicalSlotAdvanceAndCheckSnapState(XLogRecPtr
> > moveto, bool *found_consistent_snapshot) to
> > pg_logical_replication_slot_advance(XLogRecPtr moveto, bool
> > *found_consistent_snapshot) and use it. If others don't like
On Tue, Apr 2, 2024 at 7:42 PM Bharath Rupireddy
wrote:
>
> On Tue, Apr 2, 2024 at 7:25 PM Zhijie Hou (Fujitsu)
> wrote:
> >
> > > 1. Can we just remove pg_logical_replication_slot_advance and use
> > > LogicalSlotAdvanceAndCheckSnapState instead? If worried about the
> > > function naming,
On Tue, Apr 2, 2024 at 7:25 PM Zhijie Hou (Fujitsu)
wrote:
>
> > 1. Can we just remove pg_logical_replication_slot_advance and use
> > LogicalSlotAdvanceAndCheckSnapState instead? If worried about the
> > function naming, LogicalSlotAdvanceAndCheckSnapState can be renamed to
> >
On Tuesday, April 2, 2024 8:49 PM Bharath Rupireddy
wrote:
>
> On Tue, Apr 2, 2024 at 2:11 PM Zhijie Hou (Fujitsu)
> wrote:
> >
> > CFbot[1] complained about one query result's order in the tap-test, so I am
> > attaching a V7 patch set which fixed this. There are no changes in 0001.
> >
> >
On Tue, Apr 2, 2024 at 2:11 PM Zhijie Hou (Fujitsu)
wrote:
>
> CFbot[1] complained about one query result's order in the tap-test, so I am
> attaching a V7 patch set which fixed this. There are no changes in 0001.
>
> [1] https://cirrus-ci.com/task/6375962162495488
Thanks. Here are some
Hi,
On Tue, Apr 02, 2024 at 02:19:30PM +0530, Amit Kapila wrote:
> On Tue, Apr 2, 2024 at 1:54 PM Bertrand Drouvot
> wrote:
> > What about adding a "wait" injection point in LogStandbySnapshot() to
> > prevent
> > checkpointer/bgwriter to log a standby snapshot? Something among those
> >
On Tue, Apr 2, 2024 at 1:54 PM Bertrand Drouvot
wrote:
>
> On Tue, Apr 02, 2024 at 07:20:46AM +, Zhijie Hou (Fujitsu) wrote:
> > I added one test in 040_standby_failover_slots_sync.pl in 0002 patch, which
> > can
> > reproduce the data loss issue consistently on my machine.
>
> Thanks!
>
> >
On Tuesday, April 2, 2024 3:21 PM Zhijie Hou (Fujitsu)
wrote:
> On Tuesday, April 2, 2024 8:35 AM Zhijie Hou (Fujitsu)
> wrote:
> >
> > On Monday, April 1, 2024 7:30 PM Amit Kapila
> > wrote:
> > >
> > > On Mon, Apr 1, 2024 at 6:26 AM Zhijie Hou (Fujitsu)
> > >
> > > wrote:
> > > >
> > > > On
Hi,
On Tue, Apr 02, 2024 at 07:20:46AM +, Zhijie Hou (Fujitsu) wrote:
> I added one test in 040_standby_failover_slots_sync.pl in 0002 patch, which
> can
> reproduce the data loss issue consistently on my machine.
Thanks!
> It may not reproduce
> in some rare cases if concurrent
On Tuesday, April 2, 2024 8:35 AM Zhijie Hou (Fujitsu)
wrote:
>
> On Monday, April 1, 2024 7:30 PM Amit Kapila
> wrote:
> >
> > On Mon, Apr 1, 2024 at 6:26 AM Zhijie Hou (Fujitsu)
> >
> > wrote:
> > >
> > > On Friday, March 29, 2024 2:50 PM Amit Kapila
> > >
> > wrote:
> > > >
> > >
> > > >
Hi,
On Tue, Apr 02, 2024 at 04:24:49AM +, Zhijie Hou (Fujitsu) wrote:
> On Monday, April 1, 2024 9:28 PM Bertrand Drouvot
> wrote:
> >
> > On Mon, Apr 01, 2024 at 05:04:53PM +0530, Amit Kapila wrote:
> > > On Mon, Apr 1, 2024 at 2:51 PM Bertrand Drouvot
> >
> > >
> > > > 2 ===
> > > >
> >
On Mon, Apr 1, 2024 at 5:05 PM Amit Kapila wrote:
>
> > 2 ===
> >
> > + {
> > + if (SnapBuildSnapshotExists(remote_slot->restart_lsn))
> > + {
> >
> > That could call SnapBuildSnapshotExists() multiple times for the same
> > "restart_lsn" (for example in case of
On Tuesday, April 2, 2024 8:43 AM Bharath Rupireddy
wrote:
>
> On Mon, Apr 1, 2024 at 11:36 AM Zhijie Hou (Fujitsu)
> wrote:
> >
> > Attach the V4 patch which includes the optimization to skip the
> > decoding if the snapshot at the syncing restart_lsn is already
> > serialized. It can avoid
On Mon, Apr 1, 2024 at 6:58 PM Bertrand Drouvot
wrote:
>
> On Mon, Apr 01, 2024 at 05:04:53PM +0530, Amit Kapila wrote:
> > On Mon, Apr 1, 2024 at 2:51 PM Bertrand Drouvot
> > wrote:
> > > Then there is no need to call WaitForStandbyConfirmation() as it could go
> > > until
> > > the
On Mon, Apr 1, 2024 at 11:36 AM Zhijie Hou (Fujitsu)
wrote:
>
> Attach the V4 patch which includes the optimization to skip the decoding if
> the snapshot at the syncing restart_lsn is already serialized. It can avoid
> most
> of the duplicate decoding in my test, and I am doing some more tests
On Monday, April 1, 2024 7:30 PM Amit Kapila wrote:
>
> On Mon, Apr 1, 2024 at 6:26 AM Zhijie Hou (Fujitsu)
> wrote:
> >
> > On Friday, March 29, 2024 2:50 PM Amit Kapila
> wrote:
> > >
> >
> > >
> > >
> > > 2.
> > > +extern XLogRecPtr pg_logical_replication_slot_advance(XLogRecPtr
> moveto,
>
Hi,
On Mon, Apr 01, 2024 at 05:04:53PM +0530, Amit Kapila wrote:
> On Mon, Apr 1, 2024 at 2:51 PM Bertrand Drouvot
> wrote:
> > Then there is no need to call WaitForStandbyConfirmation() as it could go
> > until
> > the RecoveryInProgress() in StandbySlotsHaveCaughtup() for nothing (as we
> >
On Mon, Apr 1, 2024 at 10:40 AM Amit Kapila wrote:
>
> After this step and before the next, did you ensure that the slot sync
> has synced the latest confirmed_flush/restart LSNs? You can query:
> "select slot_name,restart_lsn, confirmed_flush_lsn from
> pg_replication_slots;" to ensure the same
On Mon, Apr 1, 2024 at 2:51 PM Bertrand Drouvot
wrote:
>
> Hi,
>
> On Mon, Apr 01, 2024 at 06:05:34AM +, Zhijie Hou (Fujitsu) wrote:
> > On Monday, April 1, 2024 8:56 AM Zhijie Hou (Fujitsu)
> > wrote:
> > Attach the V4 patch which includes the optimization to skip the decoding if
> > the
On Mon, Apr 1, 2024 at 6:26 AM Zhijie Hou (Fujitsu)
wrote:
>
> On Friday, March 29, 2024 2:50 PM Amit Kapila wrote:
> >
>
> >
> >
> > 2.
> > +extern XLogRecPtr pg_logical_replication_slot_advance(XLogRecPtr moveto,
> > + bool *found_consistent_point);
> > +
> >
> > This API looks a bit awkward
Did performance test on optimization patch
(v2-0001-optimize-the-slot-advancement.patch). Please find the
results:
Setup:
- One primary node with 100 failover-enabled logical slots
- 20 DBs, each having 5 failover-enabled logical replication slots
- One physical standby node with
Hi,
On Mon, Apr 01, 2024 at 06:05:34AM +, Zhijie Hou (Fujitsu) wrote:
> On Monday, April 1, 2024 8:56 AM Zhijie Hou (Fujitsu)
> wrote:
> Attach the V4 patch which includes the optimization to skip the decoding if
> the snapshot at the syncing restart_lsn is already serialized. It can avoid
On Mon, Apr 1, 2024 at 10:40 AM Amit Kapila wrote:
>
> On Mon, Apr 1, 2024 at 10:01 AM Bharath Rupireddy
> wrote:
> >
> > On Thu, Mar 28, 2024 at 10:08 AM Zhijie Hou (Fujitsu)
> > wrote:
> > >
> > > [2] The steps to reproduce the data miss issue on a primary->standby
> > > setup:
> >
> > I'm
On Monday, April 1, 2024 8:56 AM Zhijie Hou (Fujitsu)
wrote:
>
> On Friday, March 29, 2024 2:50 PM Amit Kapila
> wrote:
> >
> > On Fri, Mar 29, 2024 at 6:36 AM Zhijie Hou (Fujitsu)
> >
> > wrote:
> > >
> > >
> > > Attach a new version patch which fixed an un-initialized variable
> > > issue
On Mon, Apr 1, 2024 at 10:01 AM Bharath Rupireddy
wrote:
>
> On Thu, Mar 28, 2024 at 10:08 AM Zhijie Hou (Fujitsu)
> wrote:
> >
> > [2] The steps to reproduce the data miss issue on a primary->standby setup:
>
> I'm trying to reproduce the problem with [1], but I can see the
> changes after the
On Thu, Mar 28, 2024 at 10:08 AM Zhijie Hou (Fujitsu)
wrote:
>
> [2] The steps to reproduce the data miss issue on a primary->standby setup:
I'm trying to reproduce the problem with [1], but I can see the
changes after the standby is promoted. Am I missing anything here?
On Friday, March 29, 2024 2:50 PM Amit Kapila wrote:
>
> On Fri, Mar 29, 2024 at 6:36 AM Zhijie Hou (Fujitsu)
> wrote:
> >
> >
> > Attach a new version patch which fixed an un-initialized variable
> > issue and added some comments.
> >
>
> The other approach to fix this issue could be that the
Hi,
On Fri, Mar 29, 2024 at 02:35:22PM +0530, Amit Kapila wrote:
> On Fri, Mar 29, 2024 at 1:08 PM Bertrand Drouvot
> wrote:
> >
> > On Fri, Mar 29, 2024 at 07:23:11AM +, Zhijie Hou (Fujitsu) wrote:
> > > On Friday, March 29, 2024 2:48 PM Bertrand Drouvot
> > > wrote:
> > > >
> > > > Hi,
>
On Fri, Mar 29, 2024 at 1:08 PM Bertrand Drouvot
wrote:
>
> On Fri, Mar 29, 2024 at 07:23:11AM +, Zhijie Hou (Fujitsu) wrote:
> > On Friday, March 29, 2024 2:48 PM Bertrand Drouvot
> > wrote:
> > >
> > > Hi,
> > >
> > > On Fri, Mar 29, 2024 at 01:06:15AM +, Zhijie Hou (Fujitsu) wrote:
>
On Fri, Mar 29, 2024 at 9:34 AM Hayato Kuroda (Fujitsu)
wrote:
>
> Thanks for updating the patch! Here is a comment for it.
>
> ```
> +/*
> + * By advancing the restart_lsn, confirmed_lsn, and xmin using
> + * fast-forward logical decoding, we can verify whether a
Hi,
On Fri, Mar 29, 2024 at 07:23:11AM +, Zhijie Hou (Fujitsu) wrote:
> On Friday, March 29, 2024 2:48 PM Bertrand Drouvot
> wrote:
> >
> > Hi,
> >
> > On Fri, Mar 29, 2024 at 01:06:15AM +, Zhijie Hou (Fujitsu) wrote:
> > > Attach a new version patch which fixed an un-initialized
On Friday, March 29, 2024 2:48 PM Bertrand Drouvot
wrote:
>
> Hi,
>
> On Fri, Mar 29, 2024 at 01:06:15AM +, Zhijie Hou (Fujitsu) wrote:
> > Attach a new version patch which fixed an un-initialized variable
> > issue and added some comments. Also, temporarily enable DEBUG2 for the
> > 040
On Fri, Mar 29, 2024 at 6:36 AM Zhijie Hou (Fujitsu)
wrote:
>
>
> Attach a new version patch which fixed an un-initialized variable issue and
> added some comments.
>
The other approach to fix this issue could be that the slotsync worker
get the serialized snapshot using pg_read_binary_file()
Hi,
On Fri, Mar 29, 2024 at 01:06:15AM +, Zhijie Hou (Fujitsu) wrote:
> Attach a new version patch which fixed an un-initialized variable issue and
> added some comments. Also, temporarily enable DEBUG2 for the 040 tap-test so
> that
> we can analyze the possible CFbot failures easily.
>
Dear Hou,
Thanks for updating the patch! Here is a comment for it.
```
+/*
+ * By advancing the restart_lsn, confirmed_lsn, and xmin using
+ * fast-forward logical decoding, we can verify whether a consistent
+ * snapshot can be built. This process also involves
On Fri, Mar 29, 2024 at 6:36 AM Zhijie Hou (Fujitsu)
wrote:
>
> Attach a new version patch which fixed an un-initialized variable issue and
> added some comments. Also, temporarily enable DEBUG2 for the 040 tap-test so
> that
> we can analyze the possible CFbot failures easily.
As suggested by
On Thursday, March 28, 2024 10:02 PM Zhijie Hou (Fujitsu)
wrote:
>
> On Thursday, March 28, 2024 7:32 PM Amit Kapila
> wrote:
> >
> > On Thu, Mar 28, 2024 at 10:08 AM Zhijie Hou (Fujitsu)
> >
> > wrote:
> > >
> > > When analyzing one BF error[1], we find an issue of slotsync: Since
> > > we
On Thursday, March 28, 2024 7:32 PM Amit Kapila wrote:
>
> On Thu, Mar 28, 2024 at 10:08 AM Zhijie Hou (Fujitsu)
> wrote:
> >
> > When analyzing one BF error[1], we find an issue of slotsync: Since we
> > don't perform logical decoding for the synced slots when syncing the
> > lsn/xmin of slot,
Hi,
On Thu, Mar 28, 2024 at 05:05:35PM +0530, Amit Kapila wrote:
> On Thu, Mar 28, 2024 at 3:34 PM Bertrand Drouvot
> wrote:
> >
> > On Thu, Mar 28, 2024 at 04:38:19AM +, Zhijie Hou (Fujitsu) wrote:
> >
> > > To fix this, we could use the fast forward logical decoding to advance
> > > the
On Thu, Mar 28, 2024 at 3:34 PM Bertrand Drouvot
wrote:
>
> On Thu, Mar 28, 2024 at 04:38:19AM +, Zhijie Hou (Fujitsu) wrote:
>
> > To fix this, we could use the fast forward logical decoding to advance the
> > synced
> > slot's lsn/xmin when syncing these values instead of directly updating
On Thu, Mar 28, 2024 at 10:08 AM Zhijie Hou (Fujitsu)
wrote:
>
> When analyzing one BF error[1], we find an issue of slotsync: Since we don't
> perform logical decoding for the synced slots when syncing the lsn/xmin of
> slot, no logical snapshots will be serialized to disk. So, when user starts
Hi,
On Thu, Mar 28, 2024 at 04:38:19AM +, Zhijie Hou (Fujitsu) wrote:
> Hi,
>
> When analyzing one BF error[1], we find an issue of slotsync: Since we don't
> perform logical decoding for the synced slots when syncing the lsn/xmin of
> slot, no logical snapshots will be serialized to disk.
Hi,
When analyzing one BF error[1], we find an issue of slotsync: Since we don't
perform logical decoding for the synced slots when syncing the lsn/xmin of
slot, no logical snapshots will be serialized to disk. So, when user starts to
use these synced slots after promotion, it needs to re-build
Hi,
On Thu, Mar 14, 2024 at 02:22:44AM +, Zhijie Hou (Fujitsu) wrote:
> Hi,
>
> Since the standby_slot_names patch has been committed, I am attaching the last
> doc patch for review.
>
Thanks!
1 ===
+ continue subscribing to publications now on the new primary server without
+ any
Hi,
Since the standby_slot_names patch has been committed, I am attaching the last
doc patch for review.
Best Regards,
Hou zj
v109-0001-Document-the-steps-to-check-if-the-standby-is-r.patch
Description: v109-0001-Document-the-steps-to-check-if-the-standby-is-r.patch
On Fri, Mar 8, 2024 at 9:56 AM Ajin Cherian wrote:
>
>> Pushed with minor modifications. I'll keep an eye on BF.
>>
>> BTW, one thing that we should try to evaluate a bit more is the
>> traversal of slots in StandbySlotsHaveCaughtup() where we verify if
>> all the slots mentioned in
On Fri, Mar 8, 2024 at 2:33 PM Amit Kapila wrote:
> On Thu, Mar 7, 2024 at 12:00 PM Zhijie Hou (Fujitsu)
> wrote:
> >
> >
> > Attach the V108 patch set which addressed above and Peter's comments.
> > I also removed the check for "*" in guc check hook.
> >
>
>
> Pushed with minor modifications.
On Thu, Mar 7, 2024 at 12:00 PM Zhijie Hou (Fujitsu)
wrote:
>
>
> Attach the V108 patch set which addressed above and Peter's comments.
> I also removed the check for "*" in guc check hook.
>
Pushed with minor modifications. I'll keep an eye on BF.
BTW, one thing that we should try to evaluate
On Thursday, March 7, 2024 12:46 PM Amit Kapila wrote:
>
> On Thu, Mar 7, 2024 at 7:35 AM Peter Smith
> wrote:
> >
> > Here are some review comments for v107-0001
> >
> > ==
> > src/backend/replication/slot.c
> >
> > 1.
> > +/*
> > + * Struct for the configuration of standby_slot_names.
> >
On Thursday, March 7, 2024 10:05 AM Peter Smith wrote:
>
> Here are some review comments for v107-0001
Thanks for the comments.
>
> ==
> src/backend/replication/slot.c
>
> 1.
> +/*
> + * Struct for the configuration of standby_slot_names.
> + *
> + * Note: this must be a flat
On Thu, Mar 7, 2024 at 8:37 AM shveta malik wrote:
>
I thought about whether we can make standby_slot_names as USERSET
instead of SIGHUP and it doesn't sound like a good idea as that can
lead to inconsistent standby replicas even after configuring the
correct value of standby_slot_names. One can
On Thu, Mar 7, 2024 at 7:35 AM Peter Smith wrote:
>
> Here are some review comments for v107-0001
>
> ==
> src/backend/replication/slot.c
>
> 1.
> +/*
> + * Struct for the configuration of standby_slot_names.
> + *
> + * Note: this must be a flat representation that can be held in a single
>
On Wed, Mar 6, 2024 at 5:53 PM Amit Kapila wrote:
>
> On Wed, Mar 6, 2024 at 12:07 PM Masahiko Sawada wrote:
> >
> > On Fri, Mar 1, 2024 at 3:22 PM Peter Smith wrote:
> > >
> > > On Fri, Mar 1, 2024 at 5:11 PM Masahiko Sawada
> > > wrote:
> > > >
> > > ...
> > > > +/*
> > > > +
On Wed, Mar 6, 2024 at 6:54 PM Zhijie Hou (Fujitsu)
wrote:
>
> On Wednesday, March 6, 2024 9:13 PM Zhijie Hou (Fujitsu)
> wrote:
> >
> > On Wednesday, March 6, 2024 11:04 AM Zhijie Hou (Fujitsu)
> > wrote:
> > >
> > > On Wednesday, March 6, 2024 9:30 AM Masahiko Sawada
> > > wrote:
> > >
> >
Here are some review comments for v107-0001
==
src/backend/replication/slot.c
1.
+/*
+ * Struct for the configuration of standby_slot_names.
+ *
+ * Note: this must be a flat representation that can be held in a single chunk
+ * of guc_malloc'd memory, so that it can be stored as the "extra"
On Wednesday, March 6, 2024 9:13 PM Zhijie Hou (Fujitsu)
wrote:
>
> On Wednesday, March 6, 2024 11:04 AM Zhijie Hou (Fujitsu)
> wrote:
> >
> > On Wednesday, March 6, 2024 9:30 AM Masahiko Sawada
> > wrote:
> >
> > Hi,
> >
> > > On Fri, Mar 1, 2024 at 4:21 PM Zhijie Hou (Fujitsu)
> > >
> > >
On Wednesday, March 6, 2024 11:04 AM Zhijie Hou (Fujitsu)
wrote:
>
> On Wednesday, March 6, 2024 9:30 AM Masahiko Sawada
> wrote:
>
> Hi,
>
> > On Fri, Mar 1, 2024 at 4:21 PM Zhijie Hou (Fujitsu)
> >
> > wrote:
> > >
> > > On Friday, March 1, 2024 2:11 PM Masahiko Sawada
> > wrote:
> > > >
On Wed, Mar 6, 2024 at 12:07 PM Masahiko Sawada wrote:
>
> On Fri, Mar 1, 2024 at 3:22 PM Peter Smith wrote:
> >
> > On Fri, Mar 1, 2024 at 5:11 PM Masahiko Sawada
> > wrote:
> > >
> > ...
> > > +/*
> > > + * "*" is not accepted as in that case primary will not be able
> > >
On Fri, Mar 1, 2024 at 3:22 PM Peter Smith wrote:
>
> On Fri, Mar 1, 2024 at 5:11 PM Masahiko Sawada wrote:
> >
> ...
> > +/*
> > + * "*" is not accepted as in that case primary will not be able to
> > know
> > + * for which all standbys to wait for. Even if we have
1 - 100 of 839 matches
Mail list logo