Hi,

On Fri, Mar 29, 2024 at 02:35:22PM +0530, Amit Kapila wrote:
> On Fri, Mar 29, 2024 at 1:08 PM Bertrand Drouvot
> <bertranddrouvot...@gmail.com> wrote:
> >
> > On Fri, Mar 29, 2024 at 07:23:11AM +0000, Zhijie Hou (Fujitsu) wrote:
> > > On Friday, March 29, 2024 2:48 PM Bertrand Drouvot 
> > > <bertranddrouvot...@gmail.com> wrote:
> > > >
> > > > Hi,
> > > >
> > > > On Fri, Mar 29, 2024 at 01:06:15AM +0000, Zhijie Hou (Fujitsu) wrote:
> > > > > Attach a new version patch which fixed an un-initialized variable
> > > > > issue and added some comments. Also, temporarily enable DEBUG2 for the
> > > > > 040 tap-test so that we can analyze the possible CFbot failures 
> > > > > easily.
> > > > >
> > > >
> > > > Thanks!
> > > >
> > > > +       if (remote_slot->confirmed_lsn != slot->data.confirmed_flush)
> > > > +       {
> > > > +               /*
> > > > +                * By advancing the restart_lsn, confirmed_lsn, and 
> > > > xmin using
> > > > +                * fast-forward logical decoding, we ensure that the 
> > > > required
> > > > snapshots
> > > > +                * are saved to disk. This enables logical decoding to 
> > > > quickly
> > > > reach a
> > > > +                * consistent point at the restart_lsn, eliminating the 
> > > > risk of
> > > > missing
> > > > +                * data during snapshot creation.
> > > > +                */
> > > > +
> > > > pg_logical_replication_slot_advance(remote_slot->confirmed_lsn,
> > > > +
> > > > found_consistent_point);
> > > > +               ReplicationSlotsComputeRequiredLSN();
> > > > +               updated_lsn = true;
> > > > +       }
> > > >
> > > > Instead of using pg_logical_replication_slot_advance() for each synced 
> > > > slot and
> > > > during sync cycles what about?:
> > > >
> > > > - keep sync slot synchronization as it is currently (not using
> > > > pg_logical_replication_slot_advance())
> > > > - create "an hidden" logical slot if sync slot feature is on
> > > > - at the time of promotion use pg_logical_replication_slot_advance() on 
> > > > this
> > > > hidden slot only to advance to the max lsn of the synced slots
> > > >
> > > > I'm not sure that would be enough, just asking your thoughts on this 
> > > > (benefits
> > > > would be to avoid calling pg_logical_replication_slot_advance() on each 
> > > > sync
> > > > slots and during the sync cycles).
> > >
> > > Thanks for the idea !
> > >
> > > I considered about this. I think advancing the "hidden" slot on promotion 
> > > may be a
> > > bit late, because if we cannot reach the consistent point after advancing 
> > > the
> > > "hidden" slot, then it means we may need to remove all the synced slots 
> > > as we
> > > are not sure if they are usable(will not loss data) after promotion.
> >
> > What about advancing the hidden slot during the sync cycles then?
> >
> > > The current approach is to mark such un-consistent slot as temp and 
> > > persist
> > > them once it reaches consistent point, so that user can ensure the slot 
> > > can be
> > > used after promotion once persisted.
> >
> > Right, but do we need to do so for all the sync slots? Would a single hidden
> > slot be enough?
> >
> 
> Even if we mark one of the synced slots as persistent without reaching
> a consistent state, it could create a problem after promotion. And,
> how a single hidden slot would serve the purpose, different synced
> slots will have different restart/confirmed_flush LSN and we won't be
> able to perform advancing for those using a single slot. For example,
> say for first synced slot, it has not reached a consistent state and
> then how can it try for the second slot? This sounds quite tricky to
> make work. We should go with something simple where the chances of
> introducing bugs are lesser.

Yeah, better to go with something simple.

+       if (remote_slot->confirmed_lsn != slot->data.confirmed_flush)
+       {
+               /*
+                * By advancing the restart_lsn, confirmed_lsn, and xmin using
+                * fast-forward logical decoding, we ensure that the required 
snapshots
+                * are saved to disk. This enables logical decoding to quickly 
reach a
+                * consistent point at the restart_lsn, eliminating the risk of 
missing
+                * data during snapshot creation.
+                */
+               pg_logical_replication_slot_advance(remote_slot->confirmed_lsn,
+                                                                               
        found_consistent_point);

In our case, what about skipping WaitForStandbyConfirmation() in
pg_logical_replication_slot_advance()? (It could go until the
RecoveryInProgress() check in StandbySlotsHaveCaughtup() if we don't skip it).

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com


Reply via email to