subject:"Re\: \[HACKERS\] make async slave to wait for lsn to be replayed"

Re: [HACKERS] make async slave to wait for lsn to be replayed

2024-04-11 Thread Heikki Linnakangas


On 11/04/2024 18:09, Alexander Korotkov wrote:

On Thu, Apr 11, 2024 at 1:46 AM Heikki Linnakangas  wrote:

On 07/04/2024 00:52, Alexander Korotkov wrote:

* At first, we check that pg_wal_replay_wait() is called in a non-atomic
* context.  That is, a procedure call isn't wrapped into a transaction,
* another procedure call, or a function call.
*


It's pretty unfortunate to have all these restrictions. It would be nice
to do:

select pg_wal_replay_wait('0/1234'); select * from foo;


This works for me, except that it needs "call" not "select".

# call pg_wal_replay_wait('0/1234'); select * from foo;
CALL
  i
---
(0 rows)


If you do that from psql prompt, it works because psql parses and sends 
it as two separate round-trips. Try:


psql postgres -p 5433 -c "call pg_wal_replay_wait('0/4101BBB8'); select 1"
ERROR:  pg_wal_replay_wait() must be only called in non-atomic context
DETAIL:  Make sure pg_wal_replay_wait() isn't called within a 
transaction, another procedure, or a function.



This assumption that PortalRunUtility() can tolerate us popping the
snapshot sounds very fishy. I haven't looked at what's going on there,
but doesn't sound like a great assumption.


This is what PortalRunUtility() says about this.

/*
  * Some utility commands (e.g., VACUUM) pop the ActiveSnapshot stack from
  * under us, so don't complain if it's now empty.  Otherwise, our snapshot
  * should be the top one; pop it.  Note that this could be a different
  * snapshot from the one we made above; see EnsurePortalSnapshotExists.
  */

So, if the vacuum pops a snapshot when it needs to run without a
snapshot, then it's probably OK for other utilities.  But I agree this
decision needs some consensus.


Ok, so it's at least somewhat documented that it's fine.


Overall, this feature doesn't feel quite ready for v17, and IMHO should
be reverted. It's a nice feature, so I'd love to have it fixed and
reviewed early in the v18 cycle.


Thank you for your review.  I've reverted this. Will repost this for early v18.


Thanks Alexander for working on this.

--
Heikki Linnakangas
Neon (https://neon.tech)

Re: [HACKERS] make async slave to wait for lsn to be replayed

2024-04-11 Thread Alexander Korotkov

Hi, Heikki!

Thank you for your interest in the subject.

On Thu, Apr 11, 2024 at 1:46 AM Heikki Linnakangas  wrote:
> On 07/04/2024 00:52, Alexander Korotkov wrote:
> > On Fri, Apr 5, 2024 at 9:15 PM Alvaro Herrera  
> > wrote:
> >> I'm still concerned that WaitLSNCleanup is only called in ProcKill.
> >> Does this mean that if a process throws an error while waiting, it'll
> >> not get cleaned up until it exits?  Maybe this is not a big deal, but it
> >> seems odd.
> >
> > I've added WaitLSNCleanup() to the AbortTransaction().  Just pushed
> > that together with the improvements upthread.
>
> Race condition:
>
> 1. backend: pg_wal_replay_wait('0/1234') is called. It calls WaitForLSN
> 2. backend: WaitForLSN calls addLSNWaiter('0/1234'). It adds the backend
> process to the LSN heap and returns
> 3. replay:  rm_redo record '0/1234'
> 4. backend: WaitForLSN enters for-loop, calls GetXLogReplayRecPtr()
> 5. backend: current replay LSN location is '0/1234', so we exit the loop
> 6. replay: calls WaitLSNSetLatches()
>
> In a nutshell, it's possible for the loop in WaitForLSN to exit without
> cleaning up the process from the heap. I was able to hit that by adding
> a delay after the addLSNWaiter() call:
>
> > TRAP: failed Assert("!procInfo->inHeap"), File: 
> > "../src/backend/commands/waitlsn.c", Line: 114, PID: 1936152
> > postgres: heikki postgres [local] 
> > CALL(ExceptionalCondition+0xab)[0x55da1f68787b]
> > postgres: heikki postgres [local] CALL(+0x331ec8)[0x55da1f204ec8]
> > postgres: heikki postgres [local] CALL(WaitForLSN+0x139)[0x55da1f2052cc]
> > postgres: heikki postgres [local] 
> > CALL(pg_wal_replay_wait+0x18b)[0x55da1f2056e5]
> > postgres: heikki postgres [local] 
> > CALL(ExecuteCallStmt+0x46e)[0x55da1f18031a]
> > postgres: heikki postgres [local] 
> > CALL(standard_ProcessUtility+0x8cf)[0x55da1f4b26c9]
>
> I think there's a similar race condition if the timeout is reached at
> the same time that the startup process wakes up the process.

Thank you for catching this.  I think WaitForLSN() just needs to call
deleteLSNWaiter() unconditionally after exit from the loop.

> >* At first, we check that pg_wal_replay_wait() is called in a 
> > non-atomic
> >* context.  That is, a procedure call isn't wrapped into a 
> > transaction,
> >* another procedure call, or a function call.
> >*
>
> It's pretty unfortunate to have all these restrictions. It would be nice
> to do:
>
> select pg_wal_replay_wait('0/1234'); select * from foo;

This works for me, except that it needs "call" not "select".

# call pg_wal_replay_wait('0/1234'); select * from foo;
CALL
 i
---
(0 rows)

> in a single multi-query call, to avoid the round-trip to the client. You
> can avoid it with libpq or protocol level pipelining, too, but it's more
> complicated.
>
> >* Secondly, according to PlannedStmtRequiresSnapshot(), even in an 
> > atomic
> >* context, CallStmt is processed with a snapshot.  Thankfully, we 
> > can pop
> >* this snapshot, because PortalRunUtility() can tolerate this.
>
> This assumption that PortalRunUtility() can tolerate us popping the
> snapshot sounds very fishy. I haven't looked at what's going on there,
> but doesn't sound like a great assumption.

This is what PortalRunUtility() says about this.

/*
 * Some utility commands (e.g., VACUUM) pop the ActiveSnapshot stack from
 * under us, so don't complain if it's now empty.  Otherwise, our snapshot
 * should be the top one; pop it.  Note that this could be a different
 * snapshot from the one we made above; see EnsurePortalSnapshotExists.
 */

So, if the vacuum pops a snapshot when it needs to run without a
snapshot, then it's probably OK for other utilities.  But I agree this
decision needs some consensus.

> If recovery ends while a process is waiting for an LSN to arrive, does
> it ever get woken up?

If the recovery target is promote, then the user will get an error.
If the recovery target is shutdown, then connection will get
interrupted.  If the recovery target is pause, then waiting will
continue during the pause.  Not sure about the latter case.

> The docs could use some-copy-editing, but just to point out one issue:
>
> > There are also procedures to control the progress of recovery.
>
> That's copy-pasted from an earlier sentence at the table that lists
> functions like pg_promote(), pg_wal_replay_pause(), and
> pg_is_wal_replay_paused(). The pg_wal_replay_wait() doesn't control the
> progress of recovery like those functions do, it only causes the calling
> backend to wait.
>
> Overall, this feature doesn't feel quite ready for v17, and IMHO should
> be reverted. It's a nice feature, so I'd love to have it fixed and
> reviewed early in the v18 cycle.

Thank you for your review.  I've reverted this. Will repost this for early v18.

--
Regards,
Alexander Korotkov

Re: [HACKERS] make async slave to wait for lsn to be replayed

2024-04-10 Thread Heikki Linnakangas


On 07/04/2024 00:52, Alexander Korotkov wrote:

On Fri, Apr 5, 2024 at 9:15 PM Alvaro Herrera  wrote:

I'm still concerned that WaitLSNCleanup is only called in ProcKill.
Does this mean that if a process throws an error while waiting, it'll
not get cleaned up until it exits?  Maybe this is not a big deal, but it
seems odd.


I've added WaitLSNCleanup() to the AbortTransaction().  Just pushed
that together with the improvements upthread.


Race condition:

1. backend: pg_wal_replay_wait('0/1234') is called. It calls WaitForLSN
2. backend: WaitForLSN calls addLSNWaiter('0/1234'). It adds the backend 
process to the LSN heap and returns

3. replay:  rm_redo record '0/1234'
4. backend: WaitForLSN enters for-loop, calls GetXLogReplayRecPtr()
5. backend: current replay LSN location is '0/1234', so we exit the loop
6. replay: calls WaitLSNSetLatches()

In a nutshell, it's possible for the loop in WaitForLSN to exit without 
cleaning up the process from the heap. I was able to hit that by adding 
a delay after the addLSNWaiter() call:



TRAP: failed Assert("!procInfo->inHeap"), File: 
"../src/backend/commands/waitlsn.c", Line: 114, PID: 1936152
postgres: heikki postgres [local] 
CALL(ExceptionalCondition+0xab)[0x55da1f68787b]
postgres: heikki postgres [local] CALL(+0x331ec8)[0x55da1f204ec8]
postgres: heikki postgres [local] CALL(WaitForLSN+0x139)[0x55da1f2052cc]
postgres: heikki postgres [local] CALL(pg_wal_replay_wait+0x18b)[0x55da1f2056e5]
postgres: heikki postgres [local] CALL(ExecuteCallStmt+0x46e)[0x55da1f18031a]
postgres: heikki postgres [local] 
CALL(standard_ProcessUtility+0x8cf)[0x55da1f4b26c9]


I think there's a similar race condition if the timeout is reached at 
the same time that the startup process wakes up the process.



 * At first, we check that pg_wal_replay_wait() is called in a 
non-atomic
 * context.  That is, a procedure call isn't wrapped into a transaction,
 * another procedure call, or a function call.
 *


It's pretty unfortunate to have all these restrictions. It would be nice 
to do:


select pg_wal_replay_wait('0/1234'); select * from foo;

in a single multi-query call, to avoid the round-trip to the client. You 
can avoid it with libpq or protocol level pipelining, too, but it's more 
complicated.



 * Secondly, according to PlannedStmtRequiresSnapshot(), even in an 
atomic
 * context, CallStmt is processed with a snapshot.  Thankfully, we can 
pop
 * this snapshot, because PortalRunUtility() can tolerate this.


This assumption that PortalRunUtility() can tolerate us popping the 
snapshot sounds very fishy. I haven't looked at what's going on there, 
but doesn't sound like a great assumption.


If recovery ends while a process is waiting for an LSN to arrive, does 
it ever get woken up?


The docs could use some-copy-editing, but just to point out one issue:


There are also procedures to control the progress of recovery.


That's copy-pasted from an earlier sentence at the table that lists 
functions like pg_promote(), pg_wal_replay_pause(), and 
pg_is_wal_replay_paused(). The pg_wal_replay_wait() doesn't control the 
progress of recovery like those functions do, it only causes the calling 
backend to wait.


Overall, this feature doesn't feel quite ready for v17, and IMHO should 
be reverted. It's a nice feature, so I'd love to have it fixed and 
reviewed early in the v18 cycle.


--
Heikki Linnakangas
Neon (https://neon.tech)

Re: [HACKERS] make async slave to wait for lsn to be replayed

2024-04-10 Thread Kartyshov Ivan


I did some experiments over synchronous replications and
got that cascade replication can`t be synchronous. And 
pg_wal_replay_wait() allows us to read your writes
consistency on cascade replication.
Beyond that, I added more tests on multi-standby replication
and cascade replications.

--
Ivan Kartyshov
Postgres Professional: www.postgrespro.comdiff --git a/src/test/recovery/t/043_wal_replay_wait.pl b/src/test/recovery/t/043_wal_replay_wait.pl
index bbd64aa67b..867c3dd94a 100644
--- a/src/test/recovery/t/043_wal_replay_wait.pl
+++ b/src/test/recovery/t/043_wal_replay_wait.pl
@@ -19,17 +19,17 @@ my $backup_name = 'my_backup';
 $node_primary->backup($backup_name);
 
 # Create a streaming standby with a 1 second delay from the backup
-my $node_standby = PostgreSQL::Test::Cluster->new('standby');
+my $node_standby1 = PostgreSQL::Test::Cluster->new('standby');
 my $delay = 1;
-$node_standby->init_from_backup($node_primary, $backup_name,
+$node_standby1->init_from_backup($node_primary, $backup_name,
 	has_streaming => 1);
-$node_standby->append_conf(
+$node_standby1->append_conf(
 	'postgresql.conf', qq[
 	recovery_min_apply_delay = '${delay}s'
 ]);
-$node_standby->start;
-
+$node_standby1->start;
 
+# I
 # Make sure that pg_wal_replay_wait() works: add new content to
 # primary and memorize primary's insert LSN, then wait for that LSN to be
 # replayed on standby.
@@ -37,7 +37,7 @@ $node_primary->safe_psql('postgres',
 	"INSERT INTO wait_test VALUES (generate_series(11, 20))");
 my $lsn1 =
   $node_primary->safe_psql('postgres', "SELECT pg_current_wal_insert_lsn()");
-my $output = $node_standby->safe_psql(
+my $output = $node_standby1->safe_psql(
 	'postgres', qq[
 	CALL pg_wal_replay_wait('${lsn1}', 100);
 	SELECT pg_lsn_cmp(pg_last_wal_replay_lsn(), '${lsn1}'::pg_lsn);
@@ -48,12 +48,13 @@ my $output = $node_standby->safe_psql(
 ok($output >= 0,
 	"standby reached the same LSN as primary after pg_wal_replay_wait()");
 
+# II
 # Check that new data is visible after calling pg_wal_replay_wait()
 $node_primary->safe_psql('postgres',
 	"INSERT INTO wait_test VALUES (generate_series(21, 30))");
 my $lsn2 =
   $node_primary->safe_psql('postgres', "SELECT pg_current_wal_insert_lsn()");
-$output = $node_standby->safe_psql(
+$output = $node_standby1->safe_psql(
 	'postgres', qq[
 	CALL pg_wal_replay_wait('${lsn2}');
 	SELECT count(*) FROM wait_test;
@@ -62,6 +63,82 @@ $output = $node_standby->safe_psql(
 # Make sure the current LSN on standby and is the same as primary's LSN
 ok($output eq 30, "standby reached the same LSN as primary");
 
+# III
+# Check two standby waiting LSN
+# Create a streaming second standby with a 1 second delay from the backup
+my $node_standby2 = PostgreSQL::Test::Cluster->new('standby2');
+$node_standby2->init_from_backup($node_primary, $backup_name,
+	has_streaming => 1);
+$node_standby2->append_conf(
+	'postgresql.conf', qq[
+	recovery_min_apply_delay = '${delay}s'
+]);
+$node_standby2->start;
+
+# Check that new data is visible after calling pg_wal_replay_wait()
+$node_primary->safe_psql('postgres',
+	"INSERT INTO wait_test VALUES (generate_series(31, 40))");
+$lsn2 =
+  $node_primary->safe_psql('postgres', "SELECT pg_current_wal_insert_lsn()");
+
+$output = $node_standby1->safe_psql(
+	'postgres', qq[
+	CALL pg_wal_replay_wait('${lsn2}');
+	SELECT count(*) FROM wait_test;
+]);
+my $output2 = $node_standby1->safe_psql(
+	'postgres', qq[
+	CALL pg_wal_replay_wait('${lsn2}');
+	SELECT count(*) FROM wait_test;
+]);
+
+# Make sure the current LSN on standby and standby2 are the same as
+# primary's LSN
+ok($output eq 40, "standby1 reached the same LSN as primary");
+ok($output2 eq 40, "standby2 reached the same LSN as primary");
+
+# IV
+# Create a cascading standby waiting LSN
+$backup_name = 'cas_backup';
+$node_standby1->backup($backup_name);
+
+my $cascading_standby = PostgreSQL::Test::Cluster->new('cascading_standby');
+$cascading_standby->init_from_backup($node_standby1, $backup_name,
+	has_streaming => 1,
+	has_restoring => 1);
+
+my $cascading_connstr = $node_standby1->connstr;
+$cascading_standby->append_conf(
+	'postgresql.conf', qq(
+	hot_standby_feedback = on
+	recovery_min_apply_delay = '${delay}s'
+));
+
+$cascading_standby->start;
+
+# Check that new data is visible after calling pg_wal_replay_wait()
+$node_primary->safe_psql('postgres',
+	"INSERT INTO wait_test VALUES (generate_series(41, 50))");
+$lsn2 =
+  $node_primary->safe_psql('postgres', "SELECT pg_current_wal_insert_lsn()");
+
+$output = $node_standby1->safe_psql(
+	'postgres', qq[
+	CALL pg_wal_replay_wait('${lsn2}');
+	SELECT count(*) FROM wait_test;
+]);
+$output2 = $cascading_standby->safe_psql(
+	'postgres', qq[
+	CALL pg_wal_replay_wait('${lsn2}');
+	SELECT count(*) FROM wait_test;
+]);
+
+# Make sure the current LSN on standby and standby2 are the same as
+# primary's LSN
+ok($output eq 50, "standby1 reached the same LSN as primary");
+ok($output2 eq 50, "cascading_standby reached the same LSN as

Re: [HACKERS] make async slave to wait for lsn to be replayed

2024-04-06 Thread Alexander Korotkov

Hi, Alvaro!

Thank you for your care on this matter.

On Fri, Apr 5, 2024 at 9:15 PM Alvaro Herrera  wrote:
> BTW I noticed that
> https://coverage.postgresql.org/src/backend/commands/waitlsn.c.gcov.html
> says that lsn_cmp is not covered by the tests.  This probably indicates
> that the tests are a little too light, but I'm not sure how much extra
> effort we want to spend.

I'm aware of this.  Ivan promised to send a patch to improve the test.
If he doesn't, I'll care about it.

> I'm still concerned that WaitLSNCleanup is only called in ProcKill.
> Does this mean that if a process throws an error while waiting, it'll
> not get cleaned up until it exits?  Maybe this is not a big deal, but it
> seems odd.

I've added WaitLSNCleanup() to the AbortTransaction().  Just pushed
that together with the improvements upthread.

--
Regards,
Alexander Korotkov

Re: [HACKERS] make async slave to wait for lsn to be replayed

2024-04-05 Thread Alvaro Herrera

Hello,

BTW I noticed that 
https://coverage.postgresql.org/src/backend/commands/waitlsn.c.gcov.html
says that lsn_cmp is not covered by the tests.  This probably indicates
that the tests are a little too light, but I'm not sure how much extra
effort we want to spend.

I'm still concerned that WaitLSNCleanup is only called in ProcKill.
Does this mean that if a process throws an error while waiting, it'll
not get cleaned up until it exits?  Maybe this is not a big deal, but it
seems odd.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"Now I have my system running, not a byte was off the shelf;
It rarely breaks and when it does I fix the code myself.
It's stable, clean and elegant, and lightning fast as well,
And it doesn't cost a nickel, so Bill Gates can go to hell."

Re: [HACKERS] make async slave to wait for lsn to be replayed

2024-04-03 Thread Alexander Korotkov

On Wed, Apr 3, 2024 at 10:04 PM Pavel Borisov  wrote:
> On Wed, 3 Apr 2024 at 22:18, Alexander Korotkov  wrote:
>>
>> On Wed, Apr 3, 2024 at 7:55 PM Alvaro Herrera  
>> wrote:
>> >
>> > On 2024-Apr-03, Alexander Korotkov wrote:
>> >
>> > > Regarding the shmem data structure for LSN waiters.  I didn't pick
>> > > LWLock or ConditionVariable, because I needed the ability to wake up
>> > > only those waiters whose LSN is already replayed.  In my experience
>> > > waking up a process is way slower than scanning a short flat array.
>> >
>> > I agree, but I think that's unrelated to what I was saying, which is
>> > just the patch I attach here.
>>
>> Oh, sorry for the confusion.  I'd re-read your message.  Indeed you
>> meant this very clearly!
>>
>> I'm good with the patch.  Attached revision contains a bit of a commit 
>> message.
>>
>> > > However, I agree that when the number of waiters is very high and flat
>> > > array may become a problem.  It seems that the pairing heap is not
>> > > hard to use for shmem structures.  The only memory allocation call in
>> > > paritingheap.c is in pairingheap_allocate().  So, it's only needed to
>> > > be able to initialize the pairing heap in-place, and it will be fine
>> > > for shmem.
>> >
>> > Ok.
>> >
>> > With the code as it stands today, everything in WaitLSNState apart from
>> > the pairing heap is accessed without any locking.  I think this is at
>> > least partly OK because each backend only accesses its own entry; but it
>> > deserves a comment.  Or maybe something more, because WaitLSNSetLatches
>> > does modify the entry for other backends.  (Admittedly, this could only
>> > happens for backends that are already sleeping, and it only happens
>> > with the lock acquired, so it's probably okay.  But clearly it deserves
>> > a comment.)
>>
>> Please, check 0002 patch attached.  I found it easier to move two
>> assignments we previously moved out of lock, into the lock; then claim
>> WaitLSNState.procInfos is also protected by WaitLSNLock.
>
> Could you re-attach 0002. Seems it failed to attach to the previous message.

I actually forgot both!

--
Regards,
Alexander Korotkov


v2-0001-Use-an-LWLock-instead-of-a-spinlock-in-waitlsn.c.patch
Description: Binary data


v2-0002-Clarify-what-is-protected-by-WaitLSNLock.patch
Description: Binary data

Re: [HACKERS] make async slave to wait for lsn to be replayed

2024-04-03 Thread Pavel Borisov

Hi, Alexander!

On Wed, 3 Apr 2024 at 22:18, Alexander Korotkov 
wrote:

> On Wed, Apr 3, 2024 at 7:55 PM Alvaro Herrera 
> wrote:
> >
> > On 2024-Apr-03, Alexander Korotkov wrote:
> >
> > > Regarding the shmem data structure for LSN waiters.  I didn't pick
> > > LWLock or ConditionVariable, because I needed the ability to wake up
> > > only those waiters whose LSN is already replayed.  In my experience
> > > waking up a process is way slower than scanning a short flat array.
> >
> > I agree, but I think that's unrelated to what I was saying, which is
> > just the patch I attach here.
>
> Oh, sorry for the confusion.  I'd re-read your message.  Indeed you
> meant this very clearly!
>
> I'm good with the patch.  Attached revision contains a bit of a commit
> message.
>
> > > However, I agree that when the number of waiters is very high and flat
> > > array may become a problem.  It seems that the pairing heap is not
> > > hard to use for shmem structures.  The only memory allocation call in
> > > paritingheap.c is in pairingheap_allocate().  So, it's only needed to
> > > be able to initialize the pairing heap in-place, and it will be fine
> > > for shmem.
> >
> > Ok.
> >
> > With the code as it stands today, everything in WaitLSNState apart from
> > the pairing heap is accessed without any locking.  I think this is at
> > least partly OK because each backend only accesses its own entry; but it
> > deserves a comment.  Or maybe something more, because WaitLSNSetLatches
> > does modify the entry for other backends.  (Admittedly, this could only
> > happens for backends that are already sleeping, and it only happens
> > with the lock acquired, so it's probably okay.  But clearly it deserves
> > a comment.)
>
> Please, check 0002 patch attached.  I found it easier to move two
> assignments we previously moved out of lock, into the lock; then claim
> WaitLSNState.procInfos is also protected by WaitLSNLock.
>
Could you re-attach 0002. Seems it failed to attach to the previous
message.

Regards,
Pavel

Re: [HACKERS] make async slave to wait for lsn to be replayed

2024-04-03 Thread Alexander Korotkov

On Wed, Apr 3, 2024 at 7:55 PM Alvaro Herrera  wrote:
>
> On 2024-Apr-03, Alexander Korotkov wrote:
>
> > Regarding the shmem data structure for LSN waiters.  I didn't pick
> > LWLock or ConditionVariable, because I needed the ability to wake up
> > only those waiters whose LSN is already replayed.  In my experience
> > waking up a process is way slower than scanning a short flat array.
>
> I agree, but I think that's unrelated to what I was saying, which is
> just the patch I attach here.

Oh, sorry for the confusion.  I'd re-read your message.  Indeed you
meant this very clearly!

I'm good with the patch.  Attached revision contains a bit of a commit message.

> > However, I agree that when the number of waiters is very high and flat
> > array may become a problem.  It seems that the pairing heap is not
> > hard to use for shmem structures.  The only memory allocation call in
> > paritingheap.c is in pairingheap_allocate().  So, it's only needed to
> > be able to initialize the pairing heap in-place, and it will be fine
> > for shmem.
>
> Ok.
>
> With the code as it stands today, everything in WaitLSNState apart from
> the pairing heap is accessed without any locking.  I think this is at
> least partly OK because each backend only accesses its own entry; but it
> deserves a comment.  Or maybe something more, because WaitLSNSetLatches
> does modify the entry for other backends.  (Admittedly, this could only
> happens for backends that are already sleeping, and it only happens
> with the lock acquired, so it's probably okay.  But clearly it deserves
> a comment.)

Please, check 0002 patch attached.  I found it easier to move two
assignments we previously moved out of lock, into the lock; then claim
WaitLSNState.procInfos is also protected by WaitLSNLock.

> Don't we need to WaitLSNCleanup() during error recovery or something?

Yes, there is WaitLSNCleanup(). It's currently only called from one
place, given that waiting for LSN can't be invoked from background
workers or inside the transaction.

--
Regards,
Alexander Korotkov

Re: [HACKERS] make async slave to wait for lsn to be replayed

2024-04-03 Thread Alvaro Herrera

Hello Alexander,

On 2024-Apr-03, Alexander Korotkov wrote:

> Regarding the shmem data structure for LSN waiters.  I didn't pick
> LWLock or ConditionVariable, because I needed the ability to wake up
> only those waiters whose LSN is already replayed.  In my experience
> waking up a process is way slower than scanning a short flat array.

I agree, but I think that's unrelated to what I was saying, which is
just the patch I attach here.

> However, I agree that when the number of waiters is very high and flat
> array may become a problem.  It seems that the pairing heap is not
> hard to use for shmem structures.  The only memory allocation call in
> paritingheap.c is in pairingheap_allocate().  So, it's only needed to
> be able to initialize the pairing heap in-place, and it will be fine
> for shmem.

Ok.

With the code as it stands today, everything in WaitLSNState apart from
the pairing heap is accessed without any locking.  I think this is at
least partly OK because each backend only accesses its own entry; but it
deserves a comment.  Or maybe something more, because WaitLSNSetLatches
does modify the entry for other backends.  (Admittedly, this could only
happens for backends that are already sleeping, and it only happens
with the lock acquired, so it's probably okay.  But clearly it deserves
a comment.)

Don't we need to WaitLSNCleanup() during error recovery or something?

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"World domination is proceeding according to plan"(Andrew Morton)
>From 4079dc6a6a6893055b32dee75f178b324bbaef77 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Wed, 3 Apr 2024 18:35:15 +0200
Subject: [PATCH] Use an LWLock instead of a spinlock in waitlsn.c

---
 src/backend/commands/waitlsn.c  | 15 +++
 src/backend/utils/activity/wait_event_names.txt |  1 +
 src/include/commands/waitlsn.h  |  5 +
 src/include/storage/lwlocklist.h|  1 +
 4 files changed, 10 insertions(+), 12 deletions(-)

diff --git a/src/backend/commands/waitlsn.c b/src/backend/commands/waitlsn.c
index 51a34d422e..a57b818a2d 100644
--- a/src/backend/commands/waitlsn.c
+++ b/src/backend/commands/waitlsn.c
@@ -58,7 +58,6 @@ WaitLSNShmemInit(void)
 			   );
 	if (!found)
 	{
-		SpinLockInit(>waitersHeapMutex);
 		pg_atomic_init_u64(>minWaitedLSN, PG_UINT64_MAX);
 		pairingheap_initialize(>waitersHeap, lsn_cmp, NULL);
 		memset(>procInfos, 0, MaxBackends * sizeof(WaitLSNProcInfo));
@@ -115,13 +114,13 @@ addLSNWaiter(XLogRecPtr lsn)
 	procInfo->procnum = MyProcNumber;
 	procInfo->waitLSN = lsn;
 
-	SpinLockAcquire(>waitersHeapMutex);
+	LWLockAcquire(WaitLSNLock, LW_EXCLUSIVE);
 
 	pairingheap_add(>waitersHeap, >phNode);
 	procInfo->inHeap = true;
 	updateMinWaitedLSN();
 
-	SpinLockRelease(>waitersHeapMutex);
+	LWLockRelease(WaitLSNLock);
 }
 
 /*
@@ -132,11 +131,11 @@ deleteLSNWaiter(void)
 {
 	WaitLSNProcInfo *procInfo = >procInfos[MyProcNumber];
 
-	SpinLockAcquire(>waitersHeapMutex);
+	LWLockAcquire(WaitLSNLock, LW_EXCLUSIVE);
 
 	if (!procInfo->inHeap)
 	{
-		SpinLockRelease(>waitersHeapMutex);
+		LWLockRelease(WaitLSNLock);
 		return;
 	}
 
@@ -144,7 +143,7 @@ deleteLSNWaiter(void)
 	procInfo->inHeap = false;
 	updateMinWaitedLSN();
 
-	SpinLockRelease(>waitersHeapMutex);
+	LWLockRelease(WaitLSNLock);
 }
 
 /*
@@ -160,7 +159,7 @@ WaitLSNSetLatches(XLogRecPtr currentLSN)
 
 	wakeUpProcNums = palloc(sizeof(int) * MaxBackends);
 
-	SpinLockAcquire(>waitersHeapMutex);
+	LWLockAcquire(WaitLSNLock, LW_EXCLUSIVE);
 
 	/*
 	 * Iterate the pairing heap of waiting processes till we find LSN not yet
@@ -182,7 +181,7 @@ WaitLSNSetLatches(XLogRecPtr currentLSN)
 
 	updateMinWaitedLSN();
 
-	SpinLockRelease(>waitersHeapMutex);
+	LWLockRelease(WaitLSNLock);
 
 	/*
 	 * Set latches for processes, whose waited LSNs are already replayed. This
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index 0d288d6b3d..ec43a78d60 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -330,6 +330,7 @@ WALSummarizer	"Waiting to read or update WAL summarization state."
 DSMRegistry	"Waiting to read or update the dynamic shared memory registry."
 InjectionPoint	"Waiting to read or update information related to injection points."
 SerialControl	"Waiting to read or update shared pg_serial state."
+WaitLSN	"Waiting to read or update shared Wait-for-LSN state."
 
 #
 # END OF PREDEFINED LWLOCKS (DO NOT CHANGE THIS LINE)
diff --git a/src/include/commands/waitlsn.h b/src/include/commands/waitlsn.h
index 0d80248682..b3d9eed64d 100644
--- a/src/include/commands/waitlsn.h
+++ b/src/include/commands/waitlsn.h
@@ -55,13 +55,10 @@ typedef struct WaitLSNState
 
 	/*
 	 * A pairing heap of waiting processes order by LSN values (least LSN is
-	 * on top).
+	 * on top).  Protected by WaitLSNLock.
 	 */
 	pairingheap

Re: [HACKERS] make async slave to wait for lsn to be replayed

2024-04-03 Thread Alexander Korotkov

Hi, Alvaro!

Thank you for your feedback.

On Wed, Apr 3, 2024 at 9:58 AM Alvaro Herrera  wrote:
> Hello, I noticed that commit 06c418e163e9 uses waitLSN->mutex (a
> spinlock) to protect the contents of waitLSN -- and it's used to walk an
> arbitrary long list of processes waiting ... and also, an arbitrary
> number of processes could be calling this code.  I think using a
> spinlock for this is unwise, as it'll cause busy-waiting whenever
> there's contention.  Wouldn't it be better to use an LWLock for this?
> Then the processes would sleep until the lock is freed.
>
> While nosing about the code, other things struck me:
>
> I think there should be more comments about WaitLSNProcInfo and
> WaitLSNState in waitlsn.h.
>
> In addLSNWaiter it'd be better to assign 'cur' before acquiring the
> lock.
>
> Is a plan array really the most efficient data structure for this,
> considering that you have to reorder each time you add an element?
> Maybe it is, but wouldn't it make sense to use memmove() when adding one
> element rather iterating all the remaining elements to the end of the
> queue?
>
> I think the include list in waitlsn.c could be tightened a bit:

I've just pushed commit, which shortens the include list and fixes the
order of 'cur' assigning and taking spinlock in addLSNWaiter().

Regarding the shmem data structure for LSN waiters.  I didn't pick
LWLock or ConditionVariable, because I needed the ability to wake up
only those waiters whose LSN is already replayed.  In my experience
waking up a process is way slower than scanning a short flat array.

However, I agree that when the number of waiters is very high and flat
array may become a problem.  It seems that the pairing heap is not
hard to use for shmem structures.  The only memory allocation call in
paritingheap.c is in pairingheap_allocate().  So, it's only needed to
be able to initialize the pairing heap in-place, and it will be fine
for shmem.

I'll come back with switching to the pairing heap shortly.

--
Regards,
Alexander Korotkov

Re: [HACKERS] make async slave to wait for lsn to be replayed

2024-04-03 Thread Andy Fan



> I also can confirm that a lot of users would be very happy to have
> "read your writes consistency" and ready to do something to achieve
> this at an application level.  However, they typically don't know what
> exactly they need.
>
> So, blogging about pg_wal_replay_wait() and spreading words about it
> at conferences would be highly appreciated.

Sure, once it is committed, I promise I can doing a knowledge sharing in
our organization and write a article in chinese language.  

-- 
Best Regards
Andy Fan

Re: [HACKERS] make async slave to wait for lsn to be replayed

2024-04-03 Thread Daniel Gustafsson

Buildfarm animal mamba (NetBSD 10.0 on macppc) started failing on this with
what seems like a bogus compiler warning:

waitlsn.c: In function 'WaitForLSN':
waitlsn.c:275:24: error: 'endtime' may be used uninitialized in this function 
[-Werror=maybe-uninitialized]
  275 |delay_ms = (endtime - GetCurrentTimestamp()) / 1000;
  |   ~^~~~
cc1: all warnings being treated as errors

endtime is indeed initialized further up, but initializing endtime at
declaration seems innocent enough and should make this compiler happy and the
buildfarm greener.

diff --git a/src/backend/commands/waitlsn.c b/src/backend/commands/waitlsn.c
index 6679378156..17ad0057ad 100644
--- a/src/backend/commands/waitlsn.c
+++ b/src/backend/commands/waitlsn.c
@@ -226,7 +226,7 @@ void
 WaitForLSN(XLogRecPtr targetLSN, int64 timeout)
 {
XLogRecPtr  currentLSN;
-   TimestampTz endtime;
+   TimestampTz endtime = 0;

/* Shouldn't be called when shmem isn't initialized */
Assert(waitLSN);

--
Daniel Gustafsson

Re: [HACKERS] make async slave to wait for lsn to be replayed

2024-04-03 Thread Alvaro Herrera

Hello, I noticed that commit 06c418e163e9 uses waitLSN->mutex (a
spinlock) to protect the contents of waitLSN -- and it's used to walk an
arbitrary long list of processes waiting ... and also, an arbitrary
number of processes could be calling this code.  I think using a
spinlock for this is unwise, as it'll cause busy-waiting whenever
there's contention.  Wouldn't it be better to use an LWLock for this?
Then the processes would sleep until the lock is freed.

While nosing about the code, other things struck me:

I think there should be more comments about WaitLSNProcInfo and
WaitLSNState in waitlsn.h.

In addLSNWaiter it'd be better to assign 'cur' before acquiring the
lock.

Is a plan array really the most efficient data structure for this,
considering that you have to reorder each time you add an element?
Maybe it is, but wouldn't it make sense to use memmove() when adding one
element rather iterating all the remaining elements to the end of the
queue?

I think the include list in waitlsn.c could be tightened a bit:

@@ -18,28 +18,18 @@
 #include 
 
 #include "pgstat.h"
-#include "fmgr.h"
-#include "access/transam.h"
-#include "access/xact.h"
 #include "access/xlog.h"
-#include "access/xlogdefs.h"
 #include "access/xlogrecovery.h"
-#include "catalog/pg_type.h"
 #include "commands/waitlsn.h"
-#include "executor/spi.h"
 #include "funcapi.h"
 #include "miscadmin.h"
-#include "storage/ipc.h"
 #include "storage/latch.h"
-#include "storage/pmsignal.h"
 #include "storage/proc.h"
 #include "storage/shmem.h"
-#include "storage/sinvaladt.h"
-#include "utils/builtins.h"
 #include "utils/pg_lsn.h"
 #include "utils/snapmgr.h"
-#include "utils/timestamp.h"
 #include "utils/fmgrprotos.h"
+#include "utils/wait_event_types.h"

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"Las cosas son buenas o malas segun las hace nuestra opinión" (Lisias)

Re: [HACKERS] make async slave to wait for lsn to be replayed

2024-04-02 Thread Kartyshov Ivan


On 2024-04-02 13:15, Bharath Rupireddy wrote:

On Tue, Apr 2, 2024 at 3:41 PM Kartyshov Ivan
 wrote:


8 years, we tried to add this feature, and now we suggest the best way
for this feature is to commit the minimal version first.


Just curious, do you or anyone else have an immediate use for this
function? If yes, how are they achieving read-after-write-consistency
on streaming standbys in their application right now without a
function like this?


Just now, application runs pg_current_wal_lsn() after update and then
waits on loop pg_current_wal_flush_lsn() until updated.

Or use slow synchronous_commit.

--
Ivan Kartyshov
Postgres Professional: www.postgrespro.com

Re: [HACKERS] make async slave to wait for lsn to be replayed

2024-04-02 Thread Alexander Korotkov

On Tue, Apr 2, 2024 at 2:47 PM Andy Fan  wrote:
> Bharath Rupireddy  writes:
>
> > On Tue, Apr 2, 2024 at 3:41 PM Kartyshov Ivan
> >  wrote:
> >>
> >> 8 years, we tried to add this feature, and now we suggest the best way
> >> for this feature is to commit the minimal version first.
> >
> > Just curious, do you or anyone else have an immediate use for this
> > function? If yes, how are they achieving read-after-write-consistency
> > on streaming standbys in their application right now without a
> > function like this?
>
> The link [1] may be helpful and I think the reason there is reasonable
> to me.
>
> Actually we also disucss how to make sure the "read your writes
> consistency" internally, and the soluation here looks good to me.
>
> Glad to know that this patch will be committed very soon.
>
> [1]
> https://www.postgresql.org/message-id/CAPpHfdtuiL1x4APTs7u1fCmxkVp2-ZruXcdCfprDMdnOzvdC%2BA%40mail.gmail.com

Thank you for your feedback.

I also can confirm that a lot of users would be very happy to have
"read your writes consistency" and ready to do something to achieve
this at an application level.  However, they typically don't know what
exactly they need.

So, blogging about pg_wal_replay_wait() and spreading words about it
at conferences would be highly appreciated.

--
Regards,
Alexander Korotkov

Re: [HACKERS] make async slave to wait for lsn to be replayed

2024-04-02 Thread Alexander Korotkov

On Tue, Apr 2, 2024 at 1:11 PM Kartyshov Ivan
 wrote:
> On 2024-04-02 11:14, Andy Fan wrote:
> > If so, when users call pg_wal_replay_wait, they can get informed when
> > the wal is replaied to the target_lsn, but they can't know how long
> > time
> > it waits unless they check it in application side, I think such
> > information will be useful for monitor purpose sometimes.
> >
> > select pg_wal_replay_wait(lsn, 1000);  may just take 1ms in fact, in
> > this case, I want this function return 1.
>
> Hi Andy, to get timing we can use \time in psql.
> Here is an example.
> postgres=# \timing
> Timing is on.
> postgres=# select 1;
>   ?column?
> --
>  1
> (1 row)
>
> Time: 0.536 ms
>
>
> >void
> And returning VOID is the best option, rather than returning TRUE|FALSE
> or timing. It left the logic of the procedure very simple, we get an
> error if LSN is not reached.
>
> 8 years, we tried to add this feature, and now we suggest the best way
> for this feature is to commit the minimal version first.
>
> Let's discuss further improvements in future versions.

+1,
It seems there was not yet a precedent of builtin PostgreSQL function
returning its duration.  And I don't think we need to introduce such
precedent, at least now.  This seems like we're placing the
responsibility on monitoring resources usage to an application.

I'd also like to note that pg_wal_replay_wait() comes with a dedicated
wait event.  So, one could monitor the average duration of these waits
using sampling of wait events.

--
Regards,
Alexander Korotkov

Re: [HACKERS] make async slave to wait for lsn to be replayed

2024-04-02 Thread Andy Fan

Bharath Rupireddy  writes:

> On Tue, Apr 2, 2024 at 3:41 PM Kartyshov Ivan
>  wrote:
>>
>> 8 years, we tried to add this feature, and now we suggest the best way
>> for this feature is to commit the minimal version first.
>
> Just curious, do you or anyone else have an immediate use for this
> function? If yes, how are they achieving read-after-write-consistency
> on streaming standbys in their application right now without a
> function like this?

The link [1] may be helpful and I think the reason there is reasonable
to me.

Actually we also disucss how to make sure the "read your writes
consistency" internally, and the soluation here looks good to me.

Glad to know that this patch will be committed very soon. 

[1]
https://www.postgresql.org/message-id/CAPpHfdtuiL1x4APTs7u1fCmxkVp2-ZruXcdCfprDMdnOzvdC%2BA%40mail.gmail.com

-- 
Best Regards
Andy Fan

Re: [HACKERS] make async slave to wait for lsn to be replayed

2024-04-02 Thread Bharath Rupireddy

On Tue, Apr 2, 2024 at 3:41 PM Kartyshov Ivan
 wrote:
>
> 8 years, we tried to add this feature, and now we suggest the best way
> for this feature is to commit the minimal version first.

Just curious, do you or anyone else have an immediate use for this
function? If yes, how are they achieving read-after-write-consistency
on streaming standbys in their application right now without a
function like this?

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: [HACKERS] make async slave to wait for lsn to be replayed

2024-04-02 Thread Kartyshov Ivan


On 2024-04-02 11:14, Andy Fan wrote:

If so, when users call pg_wal_replay_wait, they can get informed when
the wal is replaied to the target_lsn, but they can't know how long 
time

it waits unless they check it in application side, I think such
information will be useful for monitor purpose sometimes.

select pg_wal_replay_wait(lsn, 1000);  may just take 1ms in fact, in
this case, I want this function return 1.


Hi Andy, to get timing we can use \time in psql.
Here is an example.
postgres=# \timing
Timing is on.
postgres=# select 1;
 ?column?
--
1
(1 row)

Time: 0.536 ms



   void

And returning VOID is the best option, rather than returning TRUE|FALSE
or timing. It left the logic of the procedure very simple, we get an
error if LSN is not reached.

8 years, we tried to add this feature, and now we suggest the best way
for this feature is to commit the minimal version first.

Let's discuss further improvements in future versions.

--
Ivan Kartyshov
Postgres Professional: www.postgrespro.com

Re: [HACKERS] make async slave to wait for lsn to be replayed

2024-04-02 Thread Andy Fan



Hi Alexander!

>> +
>> +pg_wal_replay_wait (
>> +  target_lsn pg_lsn,
>> +  timeout bigint 
>> DEFAULT 0)
>> +void
>> +   
>>
>> Should we return the millseconds of waiting time?  I think this
>> information may be useful for customer if they want to know how long
>> time it waits for for minitor purpose.
>
> Please, check it more carefully.  In v17 timeout is in integer milliseconds.

I guess one of us misunderstand the other:( and I do didn't check the
code very carefully. 

Acutally I meant the "return value" rather than function argument. IIUC
the current return value is void per below statement.

>> +void

If so, when users call pg_wal_replay_wait, they can get informed when
the wal is replaied to the target_lsn, but they can't know how long time
it waits unless they check it in application side, I think such
information will be useful for monitor purpose sometimes. 

select pg_wal_replay_wait(lsn, 1000);  may just take 1ms in fact, in
this case, I want this function return 1. 

-- 
Best Regards
Andy Fan

Re: [HACKERS] make async slave to wait for lsn to be replayed

2024-04-02 Thread Alexander Korotkov

Hi, Andy!

On Tue, Apr 2, 2024 at 6:29 AM Andy Fan  wrote:
> >  Did you forget to attach the new patch?
> >
> > Yes, here it is.
> >
> > [4. text/x-diff; 
> > v17-0001-Implement-pg_wal_replay_wait-stored-procedure.patch]...
>
> +
> +pg_wal_replay_wait (
> +  target_lsn pg_lsn,
> +  timeout bigint 
> DEFAULT 0)
> +void
> +   
>
> Should we return the millseconds of waiting time?  I think this
> information may be useful for customer if they want to know how long
> time it waits for for minitor purpose.

Please, check it more carefully.  In v17 timeout is in integer milliseconds.

--
Regards,
Alexander Korotkov

Re: [HACKERS] make async slave to wait for lsn to be replayed

2024-04-01 Thread Andy Fan

Alexander Korotkov  writes:

Hello,

>  
>  Did you forget to attach the new patch?
>
> Yes, here it is. 
>
> --
> Regards,
> Alexander Korotkov 
>
> [4. text/x-diff; 
> v17-0001-Implement-pg_wal_replay_wait-stored-procedure.patch]...

+
+pg_wal_replay_wait (
+  target_lsn pg_lsn,
+  timeout bigint 
DEFAULT 0)
+void
+   

Should we return the millseconds of waiting time?  I think this
information may be useful for customer if they want to know how long
time it waits for for minitor purpose. 

-- 
Best Regards
Andy Fan

Re: [HACKERS] make async slave to wait for lsn to be replayed

2024-04-01 Thread Alexander Korotkov

On Mon, Apr 1, 2024 at 5:25 AM Bharath Rupireddy <
bharath.rupireddyforpostg...@gmail.com> wrote:

> On Mon, Apr 1, 2024 at 5:54 AM Alexander Korotkov 
> wrote:
> >
> > > 9. To me the following query blocks even though I didn't mention
> timeout.
> > > CALL pg_wal_replay_wait('0/fff');
> >
> > If your primary server is freshly initialized, you need to do quite
> > data modifications to reach this LSN.
>
> Right, but why pg_wal_replay_wait  blocks without a timeout? It must
> return an error saying it can't reach the target LSN, no?
>

How can replica know this?  It doesn't look feasible to distinguish this
situation from the situation when connection between primary and replica
became slow.  I'd keep this simple.  We have pg_sleep_for() which waits for
the specified time whatever it is.  And we have pg_wal_replay_wait() waits
till replay lsn grows to the target whatever it is.  It's up to the user to
specify the correct value.

> Did you forget to attach the new patch?
>

Yes, here it is.

--
Regards,
Alexander Korotkov

v17-0001-Implement-pg_wal_replay_wait-stored-procedure.patch
Description: Binary data

Re: [HACKERS] make async slave to wait for lsn to be replayed

2024-03-31 Thread Bharath Rupireddy

On Mon, Apr 1, 2024 at 5:54 AM Alexander Korotkov  wrote:
>
> > 9. To me the following query blocks even though I didn't mention timeout.
> > CALL pg_wal_replay_wait('0/fff');
>
> If your primary server is freshly initialized, you need to do quite
> data modifications to reach this LSN.

Right, but why pg_wal_replay_wait  blocks without a timeout? It must
return an error saying it can't reach the target LSN, no?

Did you forget to attach the new patch?

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: [HACKERS] make async slave to wait for lsn to be replayed

2024-03-31 Thread Alexander Korotkov

Hi Bharath,

Thank you for your feedback.

On Sun, Mar 31, 2024 at 8:44 AM Bharath Rupireddy
 wrote:
> On Sun, Mar 31, 2024 at 7:41 AM Alexander Korotkov  
> wrote:
> Thanks for the patch. I have a few comments on the v16 patch.
>
> 1. Is there any specific reason for pg_wal_replay_wait() being a
> procedure rather than a function? I haven't read the full thread, but
> I vaguely noticed the discussion on the new wait mechanism holding up
> a snapshot or some other resource. Is that the reason to use a stored
> procedure over a function? If yes, can we specify it somewhere in the
> commit message and just before the procedure definition in
> system_functions.sql?

Surely, there is a reason.  Function should be executed in a snapshot,
which can prevent WAL records from being replayed.  See [1] for a
particular test scenario.  In a procedure we may enforce non-atomic
context and release the snapshot.

I've mentioned that in the commit message and in the procedure code.
I don't think system_functions.sql is the place for this type of
comment.  We only use system_functions.sql to push the default values.

> 2. Is the pg_wal_replay_wait first procedure that postgres provides
> out of the box?

Yes, it appears first.  I see nothing wrong about that.

> 3. Defining a procedure for the first time in system_functions.sql
> which is supposed to be for functions seems a bit unusual to me.

>From the scope of DDL and system catalogue procedure is just another
kind of function (prokind == 'p').  So, I don't feel wrong about that.

> 4.
> +
> +endtime = TimestampTzPlusMilliseconds(GetCurrentTimestamp(), timeout);
> +
>
> +if (timeout > 0)
> +{
> +delay_ms = (endtime - GetCurrentTimestamp()) / 1000;
> +latch_events |= WL_TIMEOUT;
> +if (delay_ms <= 0)
> +break;
> +}
>
> Why is endtime calculated even for timeout <= 0 only to just skip it
> later? Can't we just do a fastpath exit if timeout = 0 and targetLSN <

OK, fixed.

> 5.
>  Parameter
> +timeout is the time in milliseconds to wait
> +for the target_lsn
> +replay. When timeout value equals to zero no
> +timeout is applied.
> +replay. When timeout value equals to zero no
> +timeout is applied.
>
> It turns out to be "When timeout value equals to zero no timeout is
> applied." I guess, we can just say something like the following which
> I picked up from pg_terminate_backend timeout parameter description.
>
>
> If timeout is not specified or zero, this
> function returns if  the WAL upto
> target_lsn  is replayed.
> If the timeout is specified (in
> milliseconds) and greater than zero, the function waits until the
> server actually replays the WAL upto target_lsn  or
> until the given time has passed. On timeout, an error is emitted.
>

Applied as you suggested with some edits from me.

> 6.
> +ereport(ERROR,
> +(errcode(ERRCODE_QUERY_CANCELED),
> + errmsg("canceling waiting for LSN due to timeout")));
>
> We can be a bit more informative here and say targetLSN and currentLSN
> something like - "timed out while waiting for target LSN %X/%X to be
> replayed; current LSN %X/%X"?

Done this way.  Adjusted other ereport()'s as well.

> 7.
> +if (context->atomic)
> +ereport(ERROR,
> +(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> + errmsg("pg_wal_replay_wait() must be only called in
> non-atomic context")));
> +
>
> Can we say what a "non-atomic context '' is in a user understandable
> way like explicit txn or whatever that might  be? "non-atomic context'
> might not sound great to the end -user.

Added errdetail() to this ereport().

> 8.
> +the movie table and get the lsn 
> after
> +changes just made.  This example uses
> pg_current_wal_insert_lsn
> +to get the lsn given that
> synchronous_commit
> +could be set to off.
>
> Can we just mention that run pg_current_wal_insert_lsn on the primary?

The mention is added.

> 9. To me the following query blocks even though I didn't mention timeout.
> CALL pg_wal_replay_wait('0/fff');

If your primary server is freshly initialized, you need to do quite
data modifications to reach this LSN.

> 10. Can't we do some input validation on the timeout parameter and
> emit an error for negative values just like pg_terminate_backend?
> CALL pg_wal_replay_wait('0/ff', -100);

Reasonable, added.

> 11.
> +
> +if (timeout > 0)
> +{
> +delay_ms = (endtime - GetCurrentTimestamp()) / 1000;
> +latch_events |= WL_TIMEOUT;
> +if (delay_ms <= 0)
> +break;
> +}
> +
>
> Can we avoid calling GetCurrentTimestamp in a for loop which can be
> costly at times especially when pg_wal_replay_wait is called with
> larger timeouts on multiple backends? Can't we reuse
>

Re: [HACKERS] make async slave to wait for lsn to be replayed

2024-03-30 Thread Bharath Rupireddy

On Sun, Mar 31, 2024 at 7:41 AM Alexander Korotkov  wrote:
>
> Hi!

Thanks for the patch. I have a few comments on the v16 patch.

1. Is there any specific reason for pg_wal_replay_wait() being a
procedure rather than a function? I haven't read the full thread, but
I vaguely noticed the discussion on the new wait mechanism holding up
a snapshot or some other resource. Is that the reason to use a stored
procedure over a function? If yes, can we specify it somewhere in the
commit message and just before the procedure definition in
system_functions.sql?

2. Is the pg_wal_replay_wait first procedure that postgres provides
out of the box?

3. Defining a procedure for the first time in system_functions.sql
which is supposed to be for functions seems a bit unusual to me.

4.
+
+endtime = TimestampTzPlusMilliseconds(GetCurrentTimestamp(), timeout);
+

+if (timeout > 0)
+{
+delay_ms = (endtime - GetCurrentTimestamp()) / 1000;
+latch_events |= WL_TIMEOUT;
+if (delay_ms <= 0)
+break;
+}

Why is endtime calculated even for timeout <= 0 only to just skip it
later? Can't we just do a fastpath exit if timeout = 0 and targetLSN <

5.
 Parameter
+timeout is the time in milliseconds to wait
+for the target_lsn
+replay. When timeout value equals to zero no
+timeout is applied.
+replay. When timeout value equals to zero no
+timeout is applied.

It turns out to be "When timeout value equals to zero no timeout is
applied." I guess, we can just say something like the following which
I picked up from pg_terminate_backend timeout parameter description.

If timeout is not specified or zero, this
function returns if  the WAL upto
target_lsn  is replayed.
If the timeout is specified (in
milliseconds) and greater than zero, the function waits until the
server actually replays the WAL upto target_lsn  or
until the given time has passed. On timeout, an error is emitted.

6.
+ereport(ERROR,
+(errcode(ERRCODE_QUERY_CANCELED),
+ errmsg("canceling waiting for LSN due to timeout")));

We can be a bit more informative here and say targetLSN and currentLSN
something like - "timed out while waiting for target LSN %X/%X to be
replayed; current LSN %X/%X"?

7.
+if (context->atomic)
+ereport(ERROR,
+(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("pg_wal_replay_wait() must be only called in
non-atomic context")));
+

Can we say what a "non-atomic context '' is in a user understandable
way like explicit txn or whatever that might  be? "non-atomic context'
might not sound great to the end -user.

8.
+the movie table and get the lsn after
+changes just made.  This example uses
pg_current_wal_insert_lsn
+to get the lsn given that
synchronous_commit
+could be set to off.

Can we just mention that run pg_current_wal_insert_lsn on the primary?

9. To me the following query blocks even though I didn't mention timeout.
CALL pg_wal_replay_wait('0/fff');

10. Can't we do some input validation on the timeout parameter and
emit an error for negative values just like pg_terminate_backend?
CALL pg_wal_replay_wait('0/ff', -100);

11.
+
+if (timeout > 0)
+{
+delay_ms = (endtime - GetCurrentTimestamp()) / 1000;
+latch_events |= WL_TIMEOUT;
+if (delay_ms <= 0)
+break;
+}
+

Can we avoid calling GetCurrentTimestamp in a for loop which can be
costly at times especially when pg_wal_replay_wait is called with
larger timeouts on multiple backends? Can't we reuse
pg_terminate_backend's timeout logic in
pg_wait_until_termination, perhaps reducing waittime to 1msec or so?

12. Why should we let every user run pg_wal_replay_wait procedure?
Can't we revoke execute from the public in system_functions.sql so
that one can decide who to run this function? Per comment #11, one can
easily cause a lot of activity by running this function on hundreds of
sessions.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: [HACKERS] make async slave to wait for lsn to be replayed

2024-03-30 Thread Alexander Korotkov

Hi!

On Sat, Mar 30, 2024 at 6:14 PM Kartyshov Ivan
 wrote:
>
> Thank you Alexander for working on patch, may be we should change some
> names:
> 1) test 043_wait_lsn.pl -> to 043_waitlsn.pl like waitlsn.c and
> waitlsn.h

I renamed that to 043_wal_replay_wait.pl to match the name of SQL procedure.

> In waitlsn.c and waitlsn.h variables:
> 2) targret_lsn -> trgLSN like curLSN

I prefer this to match the SQL procedure parameter name.

> 3) lsn -> trgLSN like curLSN

Done.

Also I implemented termination of wal replay waits on standby
promotion (with test).

In the test I change recovery_min_apply_delay to 1s in order to make
the test pass faster.

--
Regards,
Alexander Korotkov

v16-0001-Implement-pg_wal_replay_wait-stored-procedure.patch
Description: Binary data

Re: [HACKERS] make async slave to wait for lsn to be replayed

2024-03-30 Thread Kartyshov Ivan

Thank you Alexander for working on patch, may be we should change some 
names:
1) test 043_wait_lsn.pl -> to 043_waitlsn.pl like waitlsn.c and 
waitlsn.h



In waitlsn.c and waitlsn.h variables:
2) targret_lsn -> trgLSN like curLSN

3) lsn -> trgLSN like curLSN

--
Ivan Kartyshov
Postgres Professional: www.postgrespro.com

Re: [HACKERS] make async slave to wait for lsn to be replayed

2024-03-30 Thread Alexander Korotkov

On Fri, Mar 29, 2024 at 6:50 PM Euler Taveira  wrote:
> On Fri, Mar 29, 2024, at 9:44 AM, Alexander Korotkov wrote:
>
> This generally makes sense, but I'm not sure about this.  The
> milliseconds timeout was used initially but received critics in [1].
>
>
> Alexander, I see why you changed the patch.
>
> Peter suggested to use an interval but you proposed another data type:
> float. The advantage of the interval data type is that you don't need to
> carefully think about the unit, however, if you use the integer data
> type you have to propose one. (If that's the case, milliseconds is a
> good granularity for this feature.) I don't have a strong preference
> between integer and interval data types but I don't like the float for
> this case. The 2 main reasons are (a) that we treat time units (hours,
> minutes, seconds, ...) as integers so it seems natural for a human being
> to use a unit time as integer and (b) depending on the number of digits
> after the decimal separator you still don't have an integer in the
> internal unit, hence, you have to round it to integer.
>
> We already have functions that use integer (such as pg_terminate_backend)
> and interval (such as pg_sleep_for) and if i searched correctly it will
> be the first timeout argument as float.

Thank you for the detailed explanation.  Float seconds are used in
pg_sleep() just similar to the interval in pg_sleep_for().  However,
that's a delay, not exactly a timeout.  Given the precedent of
milliseconds timeout in pg_terminate_backend(), your and Pavel's
points, I've switched back to integer milliseconds timeout.

Some fixes spotted off-list by Alexander Lakhin.
1) We don't need an explicit check for the postmaster being alive as
soon as we pass WL_EXIT_ON_PM_DEATH to WaitLatch().
2) When testing for unreachable LSN, we need to select LSN well in
advance so that autovacuum couldn't affect that.

I'm going to push this if no objections.

--
Regards,
Alexander Korotkov

v15-0001-Implement-pg_wal_replay_wait-stored-procedure.patch
Description: Binary data

Re: [HACKERS] make async slave to wait for lsn to be replayed

2024-03-29 Thread Euler Taveira

On Fri, Mar 29, 2024, at 9:44 AM, Alexander Korotkov wrote:
> This generally makes sense, but I'm not sure about this.  The
> milliseconds timeout was used initially but received critics in [1].

Alexander, I see why you changed the patch.

Peter suggested to use an interval but you proposed another data type:
float. The advantage of the interval data type is that you don't need to
carefully think about the unit, however, if you use the integer data
type you have to propose one. (If that's the case, milliseconds is a
good granularity for this feature.) I don't have a strong preference
between integer and interval data types but I don't like the float for
this case. The 2 main reasons are (a) that we treat time units (hours,
minutes, seconds, ...) as integers so it seems natural for a human being
to use a unit time as integer and (b) depending on the number of digits
after the decimal separator you still don't have an integer in the
internal unit, hence, you have to round it to integer.

We already have functions that use integer (such as pg_terminate_backend)
and interval (such as pg_sleep_for) and if i searched correctly it will
be the first timeout argument as float.

--
Euler Taveira
EDB   https://www.enterprisedb.com/

Re: [HACKERS] make async slave to wait for lsn to be replayed

2024-03-29 Thread Pavel Borisov

Hi, hackers!

On Fri, 29 Mar 2024 at 16:45, Alexander Korotkov 
wrote:

> Hi, Euler!
>
> On Fri, Mar 29, 2024 at 1:38 AM Euler Taveira  wrote:
> > On Thu, Mar 28, 2024, at 9:39 AM, Alexander Korotkov wrote:
> >
> > Fixed along with other issues spotted by Alexander Lakhin.
> >
> >
> > [I didn't read the whole thread. I'm sorry if I missed something ...]
> >
> > You renamed the function in a previous version but let me suggest
> another one:
> > pg_wal_replay_wait. It uses the same pattern as the other recovery
> control
> > functions [1]. I think "for" doesn't add much for the function name and
> "lsn" is
> > used in functions that return an LSN (that's not the case here).
> >
> > postgres=# \df pg_wal_replay*
> > List of functions
> > -[ RECORD 1 ]---+-
> > Schema  | pg_catalog
> > Name| pg_wal_replay_pause
> > Result data type| void
> > Argument data types |
> > Type| func
> > -[ RECORD 2 ]---+-
> > Schema  | pg_catalog
> > Name| pg_wal_replay_resume
> > Result data type| void
> > Argument data types |
> > Type| func
>
> Makes sense to me.  I tried to make a new procedure name consistent
> with functions acquiring various WAL positions.  But you're right,
> it's better to be consistent with other functions controlling wal
> replay.
>
> > Regarding the arguments, I think the timeout should be bigint. There is
> at least
> > another function that implements a timeout that uses bigint.
> >
> > postgres=# \df pg_terminate_backend
> > List of functions
> > -[ RECORD 1 ]---+--
> > Schema  | pg_catalog
> > Name| pg_terminate_backend
> > Result data type| boolean
> > Argument data types | pid integer, timeout bigint DEFAULT 0
> > Type| func
> >
> > I also suggests that the timeout unit should be milliseconds, hence,
> using
> > bigint is perfectly fine for the timeout argument.
> >
> > +   
> > +Throws an ERROR if the target lsn was not
> replayed
> > +on standby within given timeout.  Parameter
> timeout
> > +is the time in seconds to wait for the
> target_lsn
> > +replay. When timeout value equals to
> zero no
> > +timeout is applied.
> > +   
>
> This generally makes sense, but I'm not sure about this.  The
> milliseconds timeout was used initially but received critics in [1].
>
I see in Postgres we already have different units for timeouts:

e.g in guc's:
wal_receiver_status_interval in seconds
tcp_keepalives_idle in seconds

commit_delay in microseconds

deadlock_timeout in milliseconds
max_standby_archive_delay in milliseconds
vacuum_cost_delay in milliseconds
autovacuum_vacuum_cost_delay in milliseconds
etc..

I haven't counted precisely, but I feel that milliseconds are the most
often used in both guc's and functions. So I'd propose using milliseconds
for the patch as it was proposed originally.

Regards,
Pavel Borisov
Supabase.

Re: [HACKERS] make async slave to wait for lsn to be replayed

2024-03-29 Thread Alexander Korotkov

Hi, Euler!

On Fri, Mar 29, 2024 at 1:38 AM Euler Taveira  wrote:
> On Thu, Mar 28, 2024, at 9:39 AM, Alexander Korotkov wrote:
>
> Fixed along with other issues spotted by Alexander Lakhin.
>
>
> [I didn't read the whole thread. I'm sorry if I missed something ...]
>
> You renamed the function in a previous version but let me suggest another one:
> pg_wal_replay_wait. It uses the same pattern as the other recovery control
> functions [1]. I think "for" doesn't add much for the function name and "lsn" 
> is
> used in functions that return an LSN (that's not the case here).
>
> postgres=# \df pg_wal_replay*
> List of functions
> -[ RECORD 1 ]---+-
> Schema  | pg_catalog
> Name| pg_wal_replay_pause
> Result data type| void
> Argument data types |
> Type| func
> -[ RECORD 2 ]---+-
> Schema  | pg_catalog
> Name| pg_wal_replay_resume
> Result data type| void
> Argument data types |
> Type| func

Makes sense to me.  I tried to make a new procedure name consistent
with functions acquiring various WAL positions.  But you're right,
it's better to be consistent with other functions controlling wal
replay.

> Regarding the arguments, I think the timeout should be bigint. There is at 
> least
> another function that implements a timeout that uses bigint.
>
> postgres=# \df pg_terminate_backend
> List of functions
> -[ RECORD 1 ]---+--
> Schema  | pg_catalog
> Name| pg_terminate_backend
> Result data type| boolean
> Argument data types | pid integer, timeout bigint DEFAULT 0
> Type| func
>
> I also suggests that the timeout unit should be milliseconds, hence, using
> bigint is perfectly fine for the timeout argument.
>
> +   
> +Throws an ERROR if the target lsn was not replayed
> +on standby within given timeout.  Parameter 
> timeout
> +is the time in seconds to wait for the 
> target_lsn
> +replay. When timeout value equals to zero no
> +timeout is applied.
> +   

This generally makes sense, but I'm not sure about this.  The
milliseconds timeout was used initially but received critics in [1].

Links.
1. 
https://www.postgresql.org/message-id/b45ff979-9d12-4828-a22a-e4cb327e115c%40eisentraut.org

--
Regards,
Alexander Korotkov


v14-0001-Implement-pg_wal_replay_wait-stored-procedure.patch
Description: Binary data

Re: [HACKERS] make async slave to wait for lsn to be replayed

2024-03-28 Thread Euler Taveira

On Thu, Mar 28, 2024, at 9:39 AM, Alexander Korotkov wrote:
> Fixed along with other issues spotted by Alexander Lakhin.

[I didn't read the whole thread. I'm sorry if I missed something ...]

You renamed the function in a previous version but let me suggest another one:
pg_wal_replay_wait. It uses the same pattern as the other recovery control
functions [1]. I think "for" doesn't add much for the function name and "lsn" is
used in functions that return an LSN (that's not the case here).

postgres=# \df pg_wal_replay*
List of functions
-[ RECORD 1 ]---+-
Schema  | pg_catalog
Name| pg_wal_replay_pause
Result data type| void
Argument data types | 
Type| func
-[ RECORD 2 ]---+-
Schema  | pg_catalog
Name| pg_wal_replay_resume
Result data type| void
Argument data types | 
Type| func

Regarding the arguments, I think the timeout should be bigint. There is at least
another function that implements a timeout that uses bigint. 

postgres=# \df pg_terminate_backend
List of functions
-[ RECORD 1 ]---+--
Schema  | pg_catalog
Name| pg_terminate_backend
Result data type| boolean
Argument data types | pid integer, timeout bigint DEFAULT 0
Type| func

I also suggests that the timeout unit should be milliseconds, hence, using
bigint is perfectly fine for the timeout argument.

+   
+Throws an ERROR if the target lsn was not replayed
+on standby within given timeout.  Parameter 
timeout
+is the time in seconds to wait for the 
target_lsn
+replay. When timeout value equals to zero no
+timeout is applied.
+   


[1] 
https://www.postgresql.org/docs/current/functions-admin.html#FUNCTIONS-RECOVERY-CONTROL


--
Euler Taveira
EDB   https://www.enterprisedb.com/

Re: [HACKERS] make async slave to wait for lsn to be replayed

2024-03-28 Thread Alexander Korotkov

On Fri, Mar 22, 2024 at 3:45 PM Kartyshov Ivan
 wrote:
> On 2024-03-20 12:11, Alexander Korotkov wrote:
> > On Wed, Mar 20, 2024 at 12:34 AM Kartyshov Ivan
> >  wrote:
> >> > 4.2 With an unreasonably high future LSN, BEGIN command waits
> >> > unboundedly, shouldn't we check if the specified LSN is more than
> >> > pg_last_wal_receive_lsn() error out?
> >
> > I think limiting wait lsn by current received lsn would destroy the
> > whole value of this feature.  The value is to wait till given LSN is
> > replayed, whether it's already received or not.
>
> Ok sounds reasonable, I`ll rollback the changes.
>
> > But I don't see a problem here. On the replica, it's out of our
> > control to check which lsn is good and which is not.  We can't check
> > whether the lsn, which is in future for the replica, is already issued
> > by primary.
> >
> > For the case of wrong lsn, which could cause potentially infinite
> > wait, there is the timeout and the manual query cancel.
>
> Fully agree with this take.
>
> >> > 4.3 With an unreasonably high wait time, BEGIN command waits
> >> > unboundedly, shouldn't we restrict the wait time to some max
> > value,
> >> > say a day or so?
> >> > SELECT pg_last_wal_receive_lsn() + 1 AS future_receive_lsn \gset
> >> > BEGIN AFTER :'future_receive_lsn' WITHIN 10;
> >>
> >> Good idea, I put it 1 day. But this limit we should to discuss.
> >
> > Do you think that specifying timeout in milliseconds is suitable?  I
> > would prefer to switch to seconds (with ability to specify fraction of
> > second).  This was expressed before by Alexander Lakhin.
>
> It sounds like an interesting idea. Please review the result.
>
> >> > https://github.com/macdice/redo-bench or similar tools?
> >
> > Ivan, could you do this?
>
> Yes, test redo-bench/crash-recovery.sh
> This patch on master
> 91.327, 1.973
> 105.907, 3.338
> 98.412, 4.579
> 95.818, 4.19
>
> REL_13-STABLE
> 116.645, 3.005
> 113.212, 2.568
> 117.644, 3.183
> 111.411, 2.782
>
> master
> 124.712, 2.047
> 117.012, 1.736
> 116.328, 2.035
> 115.662, 1.797
>
> Strange behavior, patched version is faster then REL_13-STABLE and
> master.

I've run this test on my machine with v13 of the path.

patched
53.663, 0.466
53.884, 0.402
54.102, 0.441

master
55.216, 0.441
54.52, 0.464
51.479, 0.438

It seems that difference is less than variance.

--
Regards,
Alexander Korotkov

Re: [HACKERS] make async slave to wait for lsn to be replayed

2024-03-28 Thread Alexander Korotkov

On Thu, Mar 28, 2024 at 9:37 AM Thomas Munro  wrote:
>
> > v12
>
> Hi all,
>
> I didn't review the patch but one thing jumped out: I don't think it's
> OK to hold a spinlock while (1) looping over an array of backends and
> (2) making system calls (SetLatch()).

Good catch, thank you.

Fixed along with other issues spotted by Alexander Lakhin.

--
Regards,
Alexander Korotkov


v13-0001-Implement-pg_wait_for_wal_replay_lsn-stored-proc.patch
Description: Binary data

Re: [HACKERS] make async slave to wait for lsn to be replayed

2024-03-28 Thread Thomas Munro

> v12

Hi all,

I didn't review the patch but one thing jumped out: I don't think it's
OK to hold a spinlock while (1) looping over an array of backends and
(2) making system calls (SetLatch()).

Re: [HACKERS] make async slave to wait for lsn to be replayed

2024-03-28 Thread Alexander Korotkov

On Thu, Mar 28, 2024 at 2:24 AM Alexander Korotkov  wrote:
> On Tue, Mar 26, 2024 at 4:06 PM Kartyshov Ivan
>  wrote:
> > Thank you for your interest to the patch.
> > I understand you questions, but I fully support Alexander Korotkov idea
> > to commit the minimal required functionality. And then keep working on
> > other improvements.
>
> I did further improvements in the patch.
>
> Notably, I decided to rename the procedure to
> pg_wait_for_wal_replay_lsn().  This makes the name look consistent
> with other WAL-related functions.  Also it clearly states that we're
> waiting for lsn to be replayed (not received, written or flushed).
>
> Also, I did implements in the docs, commit message and some minor code fixes.
>
> I'm continuing to look at this patch.

Sorry, I forgot the attachment.

--
Regards,
Alexander Korotkov


v12-0001-Implement-pg_wait_for_wal_replay_lsn-stored-proc.patch
Description: Binary data

Re: [HACKERS] make async slave to wait for lsn to be replayed

2024-03-27 Thread Alexander Korotkov

On Tue, Mar 26, 2024 at 4:06 PM Kartyshov Ivan
 wrote:
> Thank you for your interest to the patch.
> I understand you questions, but I fully support Alexander Korotkov idea
> to commit the minimal required functionality. And then keep working on
> other improvements.

I did further improvements in the patch.

Notably, I decided to rename the procedure to
pg_wait_for_wal_replay_lsn().  This makes the name look consistent
with other WAL-related functions.  Also it clearly states that we're
waiting for lsn to be replayed (not received, written or flushed).

Also, I did implements in the docs, commit message and some minor code fixes.

I'm continuing to look at this patch.

--
Regards,
Alexander Korotkov

Re: [HACKERS] make async slave to wait for lsn to be replayed

2024-03-26 Thread Alexander Korotkov

On Sun, Mar 24, 2024 at 4:39 AM Bharath Rupireddy
 wrote:
> I share the same concern as yours and had proposed something upthread
> [1]. The idea is something like how each query takes a snapshot at the
> beginning of txn/query (depending on isolation level), the same way
> the standby can wait for the primary's current LSN as of the moment
> (at the time of taking snapshot). And, primary keeps sending its
> current LSN as part of regular WAL to standbys so that the standbys
> doesn't have to make connections to the primary to know its current
> LSN every time. Perhps, this may not even fully guarantee (considered
> to be achieving) the read-after-write consistency on standbys unless
> there's a way for the application to tell the wait LSN.

Oh, no.  Please, check [1].  The idea is to wait for a particular
transaction to become visible.  The one who made a change on primary
brings the lsn value from there to replica.  For instance, an
application made a change on primary and then willing to run some
report on replica.  And the report should be guaranteed to contain the
change just made.  So, the application query the LSN from primary
after making a write transaction, then calls pg_wait_lsn() on
replicate before running the report.

This is quite simple server functionality, which could be used at
application-level, ORM-level or pooler-level.  And it unlocks the way
forward for in-protocol implementation as proposed by Peter
Eisentraut.

Links.
1. 
https://www.postgresql.org/message-id/CAPpHfdtny81end69PzEdRsROKnsybsj%3DOs8DUM-6HeKGKnCuQQ%40mail.gmail.com

--
Regards,
Alexander Korotkov

Re: [HACKERS] make async slave to wait for lsn to be replayed

2024-03-26 Thread Alexander Korotkov

On Fri, Mar 22, 2024 at 12:58 AM Peter Eisentraut  wrote:
> On 17.03.24 15:09, Alexander Korotkov wrote:
> > My current attempt was to commit minimal implementation as less
> > invasive as possible.  A new clause for BEGIN doesn't require
> > additional keywords and doesn't introduce additional statements.  But
> > yes, this is still a new qual.  And, yes, Amit you're right that even
> > if I had committed that, there was still a high risk of further
> > debates and revert.
>
> I had written in [0] about my questions related to using this with
> connection poolers.  I don't think this was addressed at all.  I haven't
> seen any discussion about how to make this kind of facility usable in a
> full system.  You have to manually query and send LSNs; that seems
> pretty cumbersome.  Sure, this is part of something that could be
> useful, but how would an actual user with actual application code get to
> use this?

The current usage pattern of this functionality is the following.

1. Do the write transaction on primary
2. Query pg_current_wal_insert_lsn() on primary
3. Call pg_wait_lsn() with the value obtained on the previous step on replica
4. Do the read transaction of replica

This usage pattern could be implemented either on the application
level, or on the pooler level.  For application level, it would
require a somewhat advanced level of database-aware programming, but
this is still a valid usage.  Regarding poolers, if some poolers
manage to automatically distinguish reading and writing queries,
dealing with LSNs wouldn't be too complex for them.

Having this functionality on protocol level would be ideal, but let's
do this step-by-step.  The built-in procedure isn't very invasive, but
that could give us some adoption and open the way forward.

--
Regards,
Alexander Korotkov

Re: [HACKERS] make async slave to wait for lsn to be replayed

2024-03-26 Thread Alexander Korotkov

On Fri, Mar 22, 2024 at 12:50 AM Peter Eisentraut  wrote:
> On 19.03.24 18:38, Kartyshov Ivan wrote:
> >   CALL pg_wait_lsn('0/3002AE8', 1);
> >   BEGIN;
> >   SELECT * FROM tbl; // read fresh insertions
> >   COMMIT;
>
> I'm not endorsing this or any other approach, but I think the timeout
> parameter should be of type interval, not an integer with a unit that is
> hidden in the documentation.

I'm not sure a timeout needs to deal with complexity of our interval
datatype.  At the same time, the integer number of milliseconds looks
a bit weird.  Could the float8 number of seconds be an option?

--
Regards,
Alexander Korotkov

Re: [HACKERS] make async slave to wait for lsn to be replayed

2024-03-26 Thread Kartyshov Ivan


Thank you for your interest to the patch.
I understand you questions, but I fully support Alexander Korotkov idea
to commit the minimal required functionality. And then keep working on
other improvements.

On 2024-03-24 05:39, Bharath Rupireddy wrote:
On Fri, Mar 22, 2024 at 4:28 AM Peter Eisentraut  
wrote:


I had written in [0] about my questions related to using this with
connection poolers.  I don't think this was addressed at all.  I 
haven't
seen any discussion about how to make this kind of facility usable in 
a

full system.  You have to manually query and send LSNs; that seems
pretty cumbersome.  Sure, this is part of something that could be
useful, but how would an actual user with actual application code get 
to

use this?

[0]:
https://www.postgresql.org/message-id/8b5b172f-0ae7-d644-8358-e2851dded43b%40enterprisedb.com




But I wonder how a client is going to get the LSN.  How would all of
this be used by a client?  I can think of a scenarios where you have
an application that issues a bunch of SQL commands and you have some
kind of pooler in the middle that redirects those commands to
different hosts, and what you really want is to have it transparently
behave as if it's just a single host.  Do we want to inject a bunch
of "SELECT pg_get_lsn()", "SELECT pg_wait_lsn()" calls into that?


As I understand your question, application make dml on the primary
server, get LSN of changes and send bunch SQL read-only commands to 
pooler. Transparent behave we can get using #synchronous_commit, but

it is very slow.


I'm tempted to think this could be a protocol-layer facility.  Every
query automatically returns the current LSN, and every query can also
send along an LSN to wait for, and the client library would just keep
track of the LSN for (what it thinks of as) the connection.  So you
get some automatic serialization without having to modify your client 
code.


Thank you, it is a good question for future versions.
You say about a protocol-layer facility, what you meen. May be we can
use signals, like hot_standby_feedback.


I share the same concern as yours and had proposed something upthread
[1]. The idea is something like how each query takes a snapshot at the
beginning of txn/query (depending on isolation level), the same way
the standby can wait for the primary's current LSN as of the moment
(at the time of taking snapshot). And, primary keeps sending its
current LSN as part of regular WAL to standbys so that the standbys
doesn't have to make connections to the primary to know its current
LSN every time. Perhps, this may not even fully guarantee (considered
to be achieving) the read-after-write consistency on standbys unless
there's a way for the application to tell the wait LSN.

Thoughts?

[1] 
https://www.postgresql.org/message-id/CALj2ACUfS7LH1PaWmSZ5KwH4BpQxO9izeMw4qC3a1DAwi6nfbQ%40mail.gmail.com




+1 to have support for implicit txns. A strawman solution I can think
of is to let primary send its current insert LSN to the standby every
time it sends a bunch of WAL, and the standby waits for that LSN to be
replayed on it at the start of every implicit txn automatically.


And how standby will get lsn to wait for? All solutions I can think of
are very invasive and poorly scalable.

For example, every dml can send back LSN if dml is success. And 
application could use it to wait actual changes.



The new BEGIN syntax requires application code changes. This led me to
think how one can achieve read-after-write consistency today in a
primary - standby set up. All the logic of this patch, that is, waiting
for the standby to pass a given primary LSN needs to be done in the
application code (or in proxy or in load balancer?). I believe there
might be someone doing this already, it's good to hear from them.


You may use #synchronous_commit mode but it slow. So my implementation
don`t make primary to wait all standby to sent its feedbacks.

--
Ivan Kartyshov
Postgres Professional: www.postgrespro.comdiff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 8ecc02f2b9..8042201bab 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -28086,10 +28086,55 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset
 extension.

   
+
+  
+   
+
+ pg_wait_lsn
+
+pg_wait_lsn (trg_lsn pg_lsn, delay int8 DEFAULT 0)
+   
+   
+Returns ERROR if target LSN was not replayed on standby.
+Parameter delay sets seconds, the time to wait to the LSN.
+   
+  
  
 

 
+   
+pg_wait_lsn waits till wait_lsn
+to be replayed on standby to achieve read-your-writes-consistency, while
+using async replica for reads and primary for writes. In that case lsn of
+last modification is stored inside application.
+Note: pg_wait_lsn(trg_lsn pg_lsn, delay int8 DEFAULT 0)
+   
+
+   
+You can use pg_wait_lsn to wait the pg_lsn
+value. For example:
+

Re: [HACKERS] make async slave to wait for lsn to be replayed

2024-03-23 Thread Bharath Rupireddy

On Fri, Mar 22, 2024 at 4:28 AM Peter Eisentraut  wrote:
>
> I had written in [0] about my questions related to using this with
> connection poolers.  I don't think this was addressed at all.  I haven't
> seen any discussion about how to make this kind of facility usable in a
> full system.  You have to manually query and send LSNs; that seems
> pretty cumbersome.  Sure, this is part of something that could be
> useful, but how would an actual user with actual application code get to
> use this?
>
> [0]:
> https://www.postgresql.org/message-id/8b5b172f-0ae7-d644-8358-e2851dded43b%40enterprisedb.com

I share the same concern as yours and had proposed something upthread
[1]. The idea is something like how each query takes a snapshot at the
beginning of txn/query (depending on isolation level), the same way
the standby can wait for the primary's current LSN as of the moment
(at the time of taking snapshot). And, primary keeps sending its
current LSN as part of regular WAL to standbys so that the standbys
doesn't have to make connections to the primary to know its current
LSN every time. Perhps, this may not even fully guarantee (considered
to be achieving) the read-after-write consistency on standbys unless
there's a way for the application to tell the wait LSN.

Thoughts?

[1] 
https://www.postgresql.org/message-id/CALj2ACUfS7LH1PaWmSZ5KwH4BpQxO9izeMw4qC3a1DAwi6nfbQ%40mail.gmail.com

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: [HACKERS] make async slave to wait for lsn to be replayed

2024-03-22 Thread Kartyshov Ivan

Thank you for your feedback.

On 2024-03-20 12:11, Alexander Korotkov wrote:

On Wed, Mar 20, 2024 at 12:34 AM Kartyshov Ivan
 wrote:

> 4.2 With an unreasonably high future LSN, BEGIN command waits
> unboundedly, shouldn't we check if the specified LSN is more than
> pg_last_wal_receive_lsn() error out?

I think limiting wait lsn by current received lsn would destroy the
whole value of this feature.  The value is to wait till given LSN is
replayed, whether it's already received or not.

Ok sounds reasonable, I`ll rollback the changes.

But I don't see a problem here. On the replica, it's out of our
control to check which lsn is good and which is not.  We can't check
whether the lsn, which is in future for the replica, is already issued
by primary.

For the case of wrong lsn, which could cause potentially infinite
wait, there is the timeout and the manual query cancel.

Fully agree with this take.

> 4.3 With an unreasonably high wait time, BEGIN command waits
> unboundedly, shouldn't we restrict the wait time to some max

value,

> say a day or so?
> SELECT pg_last_wal_receive_lsn() + 1 AS future_receive_lsn \gset
> BEGIN AFTER :'future_receive_lsn' WITHIN 10;

Good idea, I put it 1 day. But this limit we should to discuss.

Do you think that specifying timeout in milliseconds is suitable?  I
would prefer to switch to seconds (with ability to specify fraction of
second).  This was expressed before by Alexander Lakhin.

It sounds like an interesting idea. Please review the result.

> https://github.com/macdice/redo-bench or similar tools?

Ivan, could you do this?

Yes, test redo-bench/crash-recovery.sh
This patch on master
91.327, 1.973
105.907, 3.338
98.412, 4.579
95.818, 4.19

REL_13-STABLE
116.645, 3.005
113.212, 2.568
117.644, 3.183
111.411, 2.782

master
124.712, 2.047
117.012, 1.736
116.328, 2.035
115.662, 1.797

Strange behavior, patched version is faster then REL_13-STABLE and 
master.

I don't see this change in the patch.  Normally if a process gets a
signal, that causes WaitLatch() to exit immediately.  It also exists
immediately on query cancel.  IIRC, this 1 minute timeout is needed to
handle some extreme cases when an interrupt is missing.  Other places
have it equal to 1 minute. I don't see why we should have it
different.

Ok, I`ll rollback my changes.

4) added and expanded sections in the documentation

I don't see this in the patch.  I see only a short description in
func.sgml, which is definitely not sufficient.  We need at least
everything we have in the docs before to be adjusted with the current
approach of procedure.

I didn't find another section where to add the description of 
pg_wait_lsn().

So I extend description on the bottom of the table.

5) add default variant of timeout
pg_wait_lsn(trg_lsn pg_lsn, delay int8 DEFAULT 0)
example: pg_wait_lsn('0/31B1B60') equal pg_wait_lsn('0/31B1B60', 0)

Does zero here mean no timeout?  I think this should be documented.
Also, I would prefer to see the timeout by default.  Probably one
minute would be good for default.

Lets discuss this point. Loop in function WaitForLSN is made that way,
if we choose delay=0, only then we can wait infinitely to wait LSN
without timeout. So default must be 0.

Please take one more look on the patch.

PS sorry, the strange BUG throw my mails out of thread.
https://www.postgresql.org/message-id/flat/f2ff071aa9141405bb8efee67558a058%40postgrespro.ru

--
Ivan Kartyshov
Postgres Professional: www.postgrespro.comdiff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 5b225ccf4f..6e1f69962d 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -27861,10 +27861,54 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset
 extension.

+
+  
+   
+
+ pg_wait_lsn
+
+pg_wait_lsn (trg_lsn pg_lsn, delay int8 DEFAULT 0)
+   
+   
+Returns ERROR if target LSN was not replayed on standby.
+Parameter delay sets seconds, the time to wait to the LSN.
+   
+  

+   
+pg_wait_lsn waits till wait_lsn
+to be replayed on standby to achieve read-your-writes-consistency, while
+using async replica for reads and primary for writes. In that case lsn of
+last modification is stored inside application.
+   
+
+   
+You can use pg_wait_lsn to wait the pg_lsn
+value. For example:
+   
+   Replica:
+   postgresql.conf
+ recovery_min_apply_delay = 10s;
+
+   Primary: Application update table and get lsn of changes
+   postgres=# UPDATE films SET kind = 'Dramatic' WHERE kind = 'Drama';
+   postgres=# SELECT pg_current_wal_lsn();
+   pg_current_wal_lsn
+   
+   0/306EE20
+   (1 row)
+
+   Replica: Wait and read updated data
+   postgres=# CALL pg_wait_lsn('0/306EE20', 100); // Timeout 100ms is insufficent
+   ERROR:  canceling waiting for LSN due to timeout
+   postgres=# CALL pg_wait_lsn('0/306EE20');
+

Re: [HACKERS] make async slave to wait for lsn to be replayed

2024-03-22 Thread Kartyshov Ivan

Thank you for your feedback.

On 2024-03-20 12:11, Alexander Korotkov wrote:

On Wed, Mar 20, 2024 at 12:34 AM Kartyshov Ivan
 wrote:

> 4.2 With an unreasonably high future LSN, BEGIN command waits
> unboundedly, shouldn't we check if the specified LSN is more than
> pg_last_wal_receive_lsn() error out?

I think limiting wait lsn by current received lsn would destroy the
whole value of this feature.  The value is to wait till given LSN is
replayed, whether it's already received or not.

Ok sounds reasonable, I`ll rollback the changes.

But I don't see a problem here. On the replica, it's out of our
control to check which lsn is good and which is not.  We can't check
whether the lsn, which is in future for the replica, is already issued
by primary.

For the case of wrong lsn, which could cause potentially infinite
wait, there is the timeout and the manual query cancel.

Fully agree with this take.

> 4.3 With an unreasonably high wait time, BEGIN command waits
> unboundedly, shouldn't we restrict the wait time to some max

value,

> say a day or so?
> SELECT pg_last_wal_receive_lsn() + 1 AS future_receive_lsn \gset
> BEGIN AFTER :'future_receive_lsn' WITHIN 10;

Good idea, I put it 1 day. But this limit we should to discuss.

Do you think that specifying timeout in milliseconds is suitable?  I
would prefer to switch to seconds (with ability to specify fraction of
second).  This was expressed before by Alexander Lakhin.

It sounds like an interesting idea. Please review the result.

> https://github.com/macdice/redo-bench or similar tools?

Ivan, could you do this?

Yes, test redo-bench/crash-recovery.sh
This patch on master
91.327, 1.973
105.907, 3.338
98.412, 4.579
95.818, 4.19

REL_13-STABLE
116.645, 3.005
113.212, 2.568
117.644, 3.183
111.411, 2.782

master
124.712, 2.047
117.012, 1.736
116.328, 2.035
115.662, 1.797

Strange behavior, patched version is faster then REL_13-STABLE and 
master.

I don't see this change in the patch.  Normally if a process gets a
signal, that causes WaitLatch() to exit immediately.  It also exists
immediately on query cancel.  IIRC, this 1 minute timeout is needed to
handle some extreme cases when an interrupt is missing.  Other places
have it equal to 1 minute. I don't see why we should have it
different.

Ok, I`ll rollback my changes.

4) added and expanded sections in the documentation

I don't see this in the patch.  I see only a short description in
func.sgml, which is definitely not sufficient.  We need at least
everything we have in the docs before to be adjusted with the current
approach of procedure.

I didn't find another section where to add the description of 
pg_wait_lsn().

So I extend description on the bottom of the table.

5) add default variant of timeout
pg_wait_lsn(trg_lsn pg_lsn, delay int8 DEFAULT 0)
example: pg_wait_lsn('0/31B1B60') equal pg_wait_lsn('0/31B1B60', 0)

Does zero here mean no timeout?  I think this should be documented.
Also, I would prefer to see the timeout by default.  Probably one
minute would be good for default.

Lets discuss this point. Loop in function WaitForLSN is made that way,
if we choose delay=0, only then we can wait infinitely to wait LSN
without timeout. So default must be 0.

Please take one more look on the patch.

--
Ivan Kartyshov
Postgres Professional: www.postgrespro.comdiff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 5b225ccf4f..6e1f69962d 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -27861,10 +27861,54 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset
 extension.

+
+  
+   
+
+ pg_wait_lsn
+
+pg_wait_lsn (trg_lsn pg_lsn, delay int8 DEFAULT 0)
+   
+   
+Returns ERROR if target LSN was not replayed on standby.
+Parameter delay sets seconds, the time to wait to the LSN.
+   
+  

+   
+pg_wait_lsn waits till wait_lsn
+to be replayed on standby to achieve read-your-writes-consistency, while
+using async replica for reads and primary for writes. In that case lsn of
+last modification is stored inside application.
+   
+
+   
+You can use pg_wait_lsn to wait the pg_lsn
+value. For example:
+   
+   Replica:
+   postgresql.conf
+ recovery_min_apply_delay = 10s;
+
+   Primary: Application update table and get lsn of changes
+   postgres=# UPDATE films SET kind = 'Dramatic' WHERE kind = 'Drama';
+   postgres=# SELECT pg_current_wal_lsn();
+   pg_current_wal_lsn
+   
+   0/306EE20
+   (1 row)
+
+   Replica: Wait and read updated data
+   postgres=# CALL pg_wait_lsn('0/306EE20', 100); // Timeout 100ms is insufficent
+   ERROR:  canceling waiting for LSN due to timeout
+   postgres=# CALL pg_wait_lsn('0/306EE20');
+   postgres=# SELECT * FROM films WHERE kind = 'Drama';
+   
+   
+

 The functions shown in  control the progress of recovery.
diff --git

Re: [HACKERS] make async slave to wait for lsn to be replayed

2024-03-22 Thread Kartyshov Ivan

On 2024-03-20 12:11, Alexander Korotkov wrote:

On Wed, Mar 20, 2024 at 12:34 AM Kartyshov Ivan
 wrote:

> 4.2 With an unreasonably high future LSN, BEGIN command waits
> unboundedly, shouldn't we check if the specified LSN is more than
> pg_last_wal_receive_lsn() error out?

I think limiting wait lsn by current received lsn would destroy the
whole value of this feature.  The value is to wait till given LSN is
replayed, whether it's already received or not.

Ok sounds reasonable, I`ll rollback the changes.

But I don't see a problem here. On the replica, it's out of our
control to check which lsn is good and which is not.  We can't check
whether the lsn, which is in future for the replica, is already issued
by primary.

For the case of wrong lsn, which could cause potentially infinite
wait, there is the timeout and the manual query cancel.

Fully agree with this take.

> 4.3 With an unreasonably high wait time, BEGIN command waits
> unboundedly, shouldn't we restrict the wait time to some max

value,

> say a day or so?
> SELECT pg_last_wal_receive_lsn() + 1 AS future_receive_lsn \gset
> BEGIN AFTER :'future_receive_lsn' WITHIN 10;

Good idea, I put it 1 day. But this limit we should to discuss.

Do you think that specifying timeout in milliseconds is suitable?  I
would prefer to switch to seconds (with ability to specify fraction of
second).  This was expressed before by Alexander Lakhin.

It sounds like an interesting idea. Please review the result.

> https://github.com/macdice/redo-bench or similar tools?

Ivan, could you do this?

Yes, test redo-bench/crash-recovery.sh
This patch on master
91.327, 1.973
105.907, 3.338
98.412, 4.579
95.818, 4.19

REL_13-STABLE
116.645, 3.005
113.212, 2.568
117.644, 3.183
111.411, 2.782

master
124.712, 2.047
117.012, 1.736
116.328, 2.035
115.662, 1.797

Strange behavior, patched version is faster then REL_13-STABLE and 
master.

I don't see this change in the patch.  Normally if a process gets a
signal, that causes WaitLatch() to exit immediately.  It also exists
immediately on query cancel.  IIRC, this 1 minute timeout is needed to
handle some extreme cases when an interrupt is missing.  Other places
have it equal to 1 minute. I don't see why we should have it
different.

Ok, I`ll rollback my changes.

4) added and expanded sections in the documentation

I don't see this in the patch.  I see only a short description in
func.sgml, which is definitely not sufficient.  We need at least
everything we have in the docs before to be adjusted with the current
approach of procedure.

I didn't find another section where to add the description of 
pg_wait_lsn().

So I extend description on the bottom of the table.

5) add default variant of timeout
pg_wait_lsn(trg_lsn pg_lsn, delay int8 DEFAULT 0)
example: pg_wait_lsn('0/31B1B60') equal pg_wait_lsn('0/31B1B60', 0)

Does zero here mean no timeout?  I think this should be documented.
Also, I would prefer to see the timeout by default.  Probably one
minute would be good for default.

Lets discuss this point. Loop in function WaitForLSN is made that way,
if we choose delay=0, only then we can wait infinitely to wait LSN
without timeout. So default must be 0.

Please take one more look on the patch.

--
Ivan Kartyshov
Postgres Professional: www.postgrespro.comdiff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 5b225ccf4f..6e1f69962d 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -27861,10 +27861,54 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset
 extension.

+
+  
+   
+
+ pg_wait_lsn
+
+pg_wait_lsn (trg_lsn pg_lsn, delay int8 DEFAULT 0)
+   
+   
+Returns ERROR if target LSN was not replayed on standby.
+Parameter delay sets seconds, the time to wait to the LSN.
+   
+  

+   
+pg_wait_lsn waits till wait_lsn
+to be replayed on standby to achieve read-your-writes-consistency, while
+using async replica for reads and primary for writes. In that case lsn of
+last modification is stored inside application.
+   
+
+   
+You can use pg_wait_lsn to wait the pg_lsn
+value. For example:
+   
+   Replica:
+   postgresql.conf
+ recovery_min_apply_delay = 10s;
+
+   Primary: Application update table and get lsn of changes
+   postgres=# UPDATE films SET kind = 'Dramatic' WHERE kind = 'Drama';
+   postgres=# SELECT pg_current_wal_lsn();
+   pg_current_wal_lsn
+   
+   0/306EE20
+   (1 row)
+
+   Replica: Wait and read updated data
+   postgres=# CALL pg_wait_lsn('0/306EE20', 100); // Timeout 100ms is insufficent
+   ERROR:  canceling waiting for LSN due to timeout
+   postgres=# CALL pg_wait_lsn('0/306EE20');
+   postgres=# SELECT * FROM films WHERE kind = 'Drama';
+   
+   
+

 The functions shown in  control the progress of recovery.
diff --git

Re: [HACKERS] make async slave to wait for lsn to be replayed

2024-03-21 Thread Peter Eisentraut


On 17.03.24 15:09, Alexander Korotkov wrote:

My current attempt was to commit minimal implementation as less
invasive as possible.  A new clause for BEGIN doesn't require
additional keywords and doesn't introduce additional statements.  But
yes, this is still a new qual.  And, yes, Amit you're right that even
if I had committed that, there was still a high risk of further
debates and revert.


I had written in [0] about my questions related to using this with 
connection poolers.  I don't think this was addressed at all.  I haven't 
seen any discussion about how to make this kind of facility usable in a 
full system.  You have to manually query and send LSNs; that seems 
pretty cumbersome.  Sure, this is part of something that could be 
useful, but how would an actual user with actual application code get to 
use this?


[0]: 
https://www.postgresql.org/message-id/8b5b172f-0ae7-d644-8358-e2851dded43b%40enterprisedb.com

Re: [HACKERS] make async slave to wait for lsn to be replayed

2024-03-21 Thread Peter Eisentraut


On 19.03.24 18:38, Kartyshov Ivan wrote:

  CALL pg_wait_lsn('0/3002AE8', 1);
  BEGIN;
  SELECT * FROM tbl; // read fresh insertions
  COMMIT;


I'm not endorsing this or any other approach, but I think the timeout 
parameter should be of type interval, not an integer with a unit that is 
hidden in the documentation.

Re: [HACKERS] make async slave to wait for lsn to be replayed

2024-03-20 Thread Alexander Korotkov

On Wed, Mar 20, 2024 at 12:34 AM Kartyshov Ivan 
wrote:
> > 4.2 With an unreasonably high future LSN, BEGIN command waits
> > unboundedly, shouldn't we check if the specified LSN is more than
> > pg_last_wal_receive_lsn() error out?

I think limiting wait lsn by current received lsn would destroy the whole
value of this feature.  The value is to wait till given LSN is replayed,
whether it's already received or not.

> > BEGIN AFTER '0/';
> > SELECT pg_last_wal_receive_lsn() + 1 AS future_receive_lsn \gset
> > BEGIN AFTER :'future_receive_lsn';
>
> This case will give ERROR cause '0/' + 1 is invalid pg_lsn

FWIW,

# SELECT '0/'::pg_lsn + 1;
 ?column?
--
 1/0
(1 row)

But I don't see a problem here. On the replica, it's out of our control to
check which lsn is good and which is not.  We can't check whether the lsn,
which is in future for the replica, is already issued by primary.

For the case of wrong lsn, which could cause potentially infinite wait,
there is the timeout and the manual query cancel.

> > 4.3 With an unreasonably high wait time, BEGIN command waits
> > unboundedly, shouldn't we restrict the wait time to some max value,
> > say a day or so?
> > SELECT pg_last_wal_receive_lsn() + 1 AS future_receive_lsn \gset
> > BEGIN AFTER :'future_receive_lsn' WITHIN 10;
>
> Good idea, I put it 1 day. But this limit we should to discuss.

Do you think that specifying timeout in milliseconds is suitable?  I would
prefer to switch to seconds (with ability to specify fraction of second).
This was expressed before by Alexander Lakhin.

> > 6.
> > +/* Set all latches in shared memory to signal that new LSN has been
> > replayed */
> > +void
> > +WaitLSNSetLatches(XLogRecPtr curLSN)
> > +{
> >
> > I see this patch is waking up all the waiters in the recovery path
> > after applying every WAL record, which IMO is a hot path. Is the
> > impact of this change on recovery measured, perhaps using
> > https://github.com/macdice/redo-bench or similar tools?

Ivan, could you do this?

> > 7. In continuation to comment #6, why not use Conditional Variables
> > instead of proc latches to sleep and wait for all the waiters in
> > WaitLSNSetLatches?
>
> Waiters are stored in the array sorted by LSN. This help us to wake
> only PIDs with replayed LSN. This saves us from scanning of whole
> array. So it`s not so hot path.

+1
This saves us from ConditionVariableBroadcast() every time we replay the
WAL record.

> Add some fixes
>
> 1) make waiting timeont more simple (as pg_terminate_backend())
> 2) removed the 1 minute wait because INTERRUPTS don’t arrive for a
> long time, changed it to 0.5 seconds

I don't see this change in the patch.  Normally if a process gets a signal,
that causes WaitLatch() to exit immediately.  It also exists immediately on
query cancel.  IIRC, this 1 minute timeout is needed to handle some extreme
cases when an interrupt is missing.  Other places have it equal to 1
minute. I don't see why we should have it different.

> 3) add more tests
> 4) added and expanded sections in the documentation

I don't see this in the patch.  I see only a short description
in func.sgml, which is definitely not sufficient.  We need at least
everything we have in the docs before to be adjusted with the current
approach of procedure.

> 5) add default variant of timeout
> pg_wait_lsn(trg_lsn pg_lsn, delay int8 DEFAULT 0)
> example: pg_wait_lsn('0/31B1B60') equal pg_wait_lsn('0/31B1B60', 0)

Does zero here mean no timeout?  I think this should be documented.  Also,
I would prefer to see the timeout by default.  Probably one minute would be
good for default.

> 6) now big timeout will be restricted to 1 day (8640ms)
> CALL pg_wait_lsn('0/34FB5A1',100);
> WARNING:  Timeout for pg_wait_lsn() restricted to 1 day

I don't think we need to mention individuals, who made proposals, in the
source code comments.  Otherwise, our source code would be a crazy mess of
names.  Also, if this is the restriction, it has to be an error.  And it
should be a proper full ereport().

--
Regards,
Alexander Korotkov

Re: [HACKERS] make async slave to wait for lsn to be replayed

2024-03-19 Thread Kartyshov Ivan


Bharath Rupireddy, thank you for you review.
But here is some points.

On 2024-03-16 10:02, Bharath Rupireddy wrote:

4.1 With invalid LSN succeeds, shouldn't it error out? Or at least,
add a fast path/quick exit to WaitForLSN()?
BEGIN AFTER '0/0';


In postgresql '0/0' is Valid pg_lsn, but it is always reached.


4.2 With an unreasonably high future LSN, BEGIN command waits
unboundedly, shouldn't we check if the specified LSN is more than
pg_last_wal_receive_lsn() error out?
BEGIN AFTER '0/';
SELECT pg_last_wal_receive_lsn() + 1 AS future_receive_lsn \gset
BEGIN AFTER :'future_receive_lsn';


This case will give ERROR cause '0/' + 1 is invalid pg_lsn


4.3 With an unreasonably high wait time, BEGIN command waits
unboundedly, shouldn't we restrict the wait time to some max value,
say a day or so?
SELECT pg_last_wal_receive_lsn() + 1 AS future_receive_lsn \gset
BEGIN AFTER :'future_receive_lsn' WITHIN 10;


Good idea, I put it 1 day. But this limit we should to discuss.


6.
+/* Set all latches in shared memory to signal that new LSN has been 
replayed */

+void
+WaitLSNSetLatches(XLogRecPtr curLSN)
+{

I see this patch is waking up all the waiters in the recovery path
after applying every WAL record, which IMO is a hot path. Is the
impact of this change on recovery measured, perhaps using
https://github.com/macdice/redo-bench or similar tools?

7. In continuation to comment #6, why not use Conditional Variables
instead of proc latches to sleep and wait for all the waiters in
WaitLSNSetLatches?


Waiters are stored in the array sorted by LSN. This help us to wake
only PIDs with replayed LSN. This saves us from scanning of whole
array. So it`s not so hot path.

Add some fixes

1) make waiting timeont more simple (as pg_terminate_backend())
2) removed the 1 minute wait because INTERRUPTS don’t arrive for a
long time, changed it to 0.5 seconds
3) add more tests
4) added and expanded sections in the documentation
5) add default variant of timeout
pg_wait_lsn(trg_lsn pg_lsn, delay int8 DEFAULT 0)
example: pg_wait_lsn('0/31B1B60') equal pg_wait_lsn('0/31B1B60', 0)
6) now big timeout will be restricted to 1 day (8640ms)
CALL pg_wait_lsn('0/34FB5A1',100);
WARNING:  Timeout for pg_wait_lsn() restricted to 1 day

--
Ivan Kartyshov
Postgres Professional: www.postgrespro.comdiff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 5b225ccf4f..106bca9b73 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -27861,6 +27861,19 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset
 extension.

   
+
+  
+   
+
+ pg_wait_lsn
+
+pg_wait_lsn (trg_lsn pg_lsn, delay int8 DEFAULT 0)
+   
+   
+If timeout <= 0 then timeout is off.
+Returns ERROR if target wait_lsn was not replayed.
+   
+  
  
 

diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c
index 29c5bec084..0b783dc733 100644
--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -43,6 +43,7 @@
 #include "backup/basebackup.h"
 #include "catalog/pg_control.h"
 #include "commands/tablespace.h"
+#include "commands/waitlsn.h"
 #include "common/file_utils.h"
 #include "miscadmin.h"
 #include "pgstat.h"
@@ -1828,6 +1829,14 @@ PerformWalRecovery(void)
 break;
 			}
 
+			/*
+			 * If we replayed an LSN that someone was waiting for, set latches
+			 * in shared memory array to notify the waiter.
+			 */
+			if (waitLSN &&
+(XLogRecoveryCtl->lastReplayedEndRecPtr >= pg_atomic_read_u64(>minLSN)))
+WaitLSNSetLatches(XLogRecoveryCtl->lastReplayedEndRecPtr);
+
 			/* Else, try to fetch the next WAL record */
 			record = ReadRecord(xlogprefetcher, LOG, false, replayTLI);
 		} while (record != NULL);
diff --git a/src/backend/catalog/system_functions.sql b/src/backend/catalog/system_functions.sql
index fe2bb50f46..ff82dffac0 100644
--- a/src/backend/catalog/system_functions.sql
+++ b/src/backend/catalog/system_functions.sql
@@ -414,6 +414,9 @@ CREATE OR REPLACE FUNCTION
   json_populate_recordset(base anyelement, from_json json, use_json_as_text boolean DEFAULT false)
   RETURNS SETOF anyelement LANGUAGE internal STABLE ROWS 100  AS 'json_populate_recordset' PARALLEL SAFE;
 
+CREATE OR REPLACE PROCEDURE pg_wait_lsn(trg_lsn pg_lsn, delay int8 DEFAULT 0)
+  LANGUAGE internal AS 'pg_wait_lsn';
+
 CREATE OR REPLACE FUNCTION pg_logical_slot_get_changes(
 IN slot_name name, IN upto_lsn pg_lsn, IN upto_nchanges int, VARIADIC options text[] DEFAULT '{}',
 OUT lsn pg_lsn, OUT xid xid, OUT data text)
diff --git a/src/backend/commands/Makefile b/src/backend/commands/Makefile
index 48f7348f91..cede90c3b9 100644
--- a/src/backend/commands/Makefile
+++ b/src/backend/commands/Makefile
@@ -61,6 +61,7 @@ OBJS = \
 	vacuum.o \
 	vacuumparallel.o \
 	variable.o \
-	view.o
+	view.o \
+	waitlsn.o
 
 include

Re: [HACKERS] make async slave to wait for lsn to be replayed

2024-03-19 Thread Kartyshov Ivan


Intro
==
The main purpose of the feature is to achieve
read-your-writes-consistency, while using async replica for reads and
primary for writes. In that case lsn of last modification is stored
inside application. We cannot store this lsn inside database, since
reads are distributed across all replicas and primary.


Procedure style implementation
==
https://www.postgresql.org/message-id/27171.1586439221%40sss.pgh.pa.us
https://www.postgresql.org/message-id/20210121.173009.235021120161403875.horikyota.ntt%40gmail.com

CALL pg_wait_lsn(‘LSN’, timeout);

Examples
==

primary standby
--- 
 postgresql.conf
 recovery_min_apply_delay = 10s


CREATE TABLE tbl AS SELECT generate_series(1,10) AS a;
INSERT INTO tbl VALUES (generate_series(11, 20));
SELECT pg_current_wal_lsn();


 CALL pg_wait_lsn('0/3002AE8', 1);
 BEGIN;
 SELECT * FROM tbl; // read fresh insertions
 COMMIT;

Fixed and ready to review.

--
Ivan Kartyshov
Postgres Professional: www.postgrespro.comdiff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 5b225ccf4f..cb1c175c62 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -27861,6 +27861,19 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset
 extension.

   
+
+  
+   
+
+ pg_wait_lsn
+
+pg_wait_lsn (wait_lsn pg_lsn, timeout bigint)
+   
+   
+If timeout <= 0 then timeout is off.
+Returns ERROR if target wait_lsn was not replayed.
+   
+  
  
 

diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c
index 29c5bec084..0b783dc733 100644
--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -43,6 +43,7 @@
 #include "backup/basebackup.h"
 #include "catalog/pg_control.h"
 #include "commands/tablespace.h"
+#include "commands/waitlsn.h"
 #include "common/file_utils.h"
 #include "miscadmin.h"
 #include "pgstat.h"
@@ -1828,6 +1829,14 @@ PerformWalRecovery(void)
 break;
 			}
 
+			/*
+			 * If we replayed an LSN that someone was waiting for, set latches
+			 * in shared memory array to notify the waiter.
+			 */
+			if (waitLSN &&
+(XLogRecoveryCtl->lastReplayedEndRecPtr >= pg_atomic_read_u64(>minLSN)))
+WaitLSNSetLatches(XLogRecoveryCtl->lastReplayedEndRecPtr);
+
 			/* Else, try to fetch the next WAL record */
 			record = ReadRecord(xlogprefetcher, LOG, false, replayTLI);
 		} while (record != NULL);
diff --git a/src/backend/commands/Makefile b/src/backend/commands/Makefile
index 48f7348f91..cede90c3b9 100644
--- a/src/backend/commands/Makefile
+++ b/src/backend/commands/Makefile
@@ -61,6 +61,7 @@ OBJS = \
 	vacuum.o \
 	vacuumparallel.o \
 	variable.o \
-	view.o
+	view.o \
+	waitlsn.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/commands/meson.build b/src/backend/commands/meson.build
index 6dd00a4abd..7549be5dc3 100644
--- a/src/backend/commands/meson.build
+++ b/src/backend/commands/meson.build
@@ -50,4 +50,5 @@ backend_sources += files(
   'vacuumparallel.c',
   'variable.c',
   'view.c',
+  'waitlsn.c',
 )
diff --git a/src/backend/commands/waitlsn.c b/src/backend/commands/waitlsn.c
new file mode 100644
index 00..f9b701c946
--- /dev/null
+++ b/src/backend/commands/waitlsn.c
@@ -0,0 +1,309 @@
+/*-
+ *
+ * waitlsn.c
+ *	  Implements waiting for the given LSN, which is used in
+ *	  CALL pg_wait_lsn(wait_lsn pg_lsn, timeout int).
+ *
+ * Copyright (c) 2024, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/commands/waitlsn.c
+ *
+ *-
+ */
+
+#include "postgres.h"
+
+#include 
+#include 
+
+#include "pgstat.h"
+#include "fmgr.h"
+#include "access/transam.h"
+#include "access/xact.h"
+#include "access/xlog.h"
+#include "access/xlogdefs.h"
+#include "access/xlogrecovery.h"
+#include "catalog/pg_type.h"
+#include "commands/waitlsn.h"
+#include "executor/spi.h"
+#include "funcapi.h"
+#include "miscadmin.h"
+#include "storage/ipc.h"
+#include "storage/latch.h"
+#include "storage/pmsignal.h"
+#include "storage/proc.h"
+#include "storage/shmem.h"
+#include "storage/sinvaladt.h"
+#include "utils/builtins.h"
+#include "utils/pg_lsn.h"
+#include "utils/snapmgr.h"
+#include "utils/timestamp.h"
+#include "utils/fmgrprotos.h"
+
+/* Add to / delete from shared memory array */
+static void addLSNWaiter(XLogRecPtr lsn);
+static void deleteLSNWaiter(void);
+
+struct WaitLSNState *waitLSN = NULL;
+static volatile sig_atomic_t haveShmemItem = false;
+
+/*
+ * Report the amount of shared memory space needed for WaitLSNState
+ */
+Size
+WaitLSNShmemSize(void)
+{
+	Size		size;
+
+	size = offsetof(WaitLSNState, procInfos);
+	size =

Re: [HACKERS] make async slave to wait for lsn to be replayed

2024-03-19 Thread Alexander Korotkov

On Tue, Mar 19, 2024 at 1:51 PM Amit Kapila  wrote:
> On Mon, Mar 18, 2024 at 3:24 PM Alexander Korotkov  
> wrote:
> >
> > On Mon, Mar 18, 2024 at 5:17 AM Amit Kapila  wrote:
> > > > 1. First, check that it was called with non-atomic context (that is,
> > > > it's not called within a transaction). Trigger error if called with
> > > > atomic context.
> > > > 2. Release a snapshot to be able to wait without risk of WAL replay
> > > > stuck.  Procedure is still called within the snapshot.  It's a bit of
> > > > a hack to release a snapshot, but Vacuum statements already do so.
> > > >
> > >
> > > Can you please provide a bit more details with some example what is
> > > the existing problem with functions and how using procedures will
> > > resolve it? How will this this address the implicit transaction case
> > > or do we have any other workaround for those cases?
> >
> > Please check [1] and [2] for the explanation of the problem with functions.
> >
> > Also, please find a draft patch implementing the procedure.  The issue with 
> > the snapshot is addressed with the following lines.
> >
> > We first ensure we're in a non-atomic context, then pop an active snapshot 
> > (tricky, but ExecuteVacuum() does the same).  Then we should have no active 
> > snapshot and it's safe to wait for lsn replay.
> >
> > if (context->atomic)
> > ereport(ERROR,
> > (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> >  errmsg("pg_wait_lsn() must be only called in non-atomic 
> > context")));
> >
> > if (ActiveSnapshotSet())
> > PopActiveSnapshot();
> > Assert(!ActiveSnapshotSet());
> >
> > The function call could be added either before the BEGIN statement or 
> > before the implicit transaction.
> >
> > CALL pg_wait_lsn('my_lsn', my_timeout); BEGIN;
> > CALL pg_wait_lsn('my_lsn', my_timeout); SELECT ...;
> >
>
> I haven't thought in detail about whether there are any other problems
> with this idea but sounds like it should solve the problems you shared
> with a function call approach. BTW, if the application has to anyway
> know the LSN till where replica needs to wait, why can't they simply
> monitor the pg_last_wal_replay_lsn() value?

Amit, thank you for your feedback.

Yes, the application can monitor pg_last_wal_replay_lsn() value,
that's our state of the art solution.  But that's rather inconvenient
and takes extra latency and network traffic. And it can't be wrapped
into a server-side function in procedural language for the reasons we
can't implement it as a built-in function.

--
Regards,
Alexander Korotkov

Re: [HACKERS] make async slave to wait for lsn to be replayed

2024-03-19 Thread Amit Kapila

On Mon, Mar 18, 2024 at 3:24 PM Alexander Korotkov  wrote:
>
> On Mon, Mar 18, 2024 at 5:17 AM Amit Kapila  wrote:
> > > 1. First, check that it was called with non-atomic context (that is,
> > > it's not called within a transaction). Trigger error if called with
> > > atomic context.
> > > 2. Release a snapshot to be able to wait without risk of WAL replay
> > > stuck.  Procedure is still called within the snapshot.  It's a bit of
> > > a hack to release a snapshot, but Vacuum statements already do so.
> > >
> >
> > Can you please provide a bit more details with some example what is
> > the existing problem with functions and how using procedures will
> > resolve it? How will this this address the implicit transaction case
> > or do we have any other workaround for those cases?
>
> Please check [1] and [2] for the explanation of the problem with functions.
>
> Also, please find a draft patch implementing the procedure.  The issue with 
> the snapshot is addressed with the following lines.
>
> We first ensure we're in a non-atomic context, then pop an active snapshot 
> (tricky, but ExecuteVacuum() does the same).  Then we should have no active 
> snapshot and it's safe to wait for lsn replay.
>
> if (context->atomic)
> ereport(ERROR,
> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>  errmsg("pg_wait_lsn() must be only called in non-atomic 
> context")));
>
> if (ActiveSnapshotSet())
> PopActiveSnapshot();
> Assert(!ActiveSnapshotSet());
>
> The function call could be added either before the BEGIN statement or before 
> the implicit transaction.
>
> CALL pg_wait_lsn('my_lsn', my_timeout); BEGIN;
> CALL pg_wait_lsn('my_lsn', my_timeout); SELECT ...;
>

I haven't thought in detail about whether there are any other problems
with this idea but sounds like it should solve the problems you shared
with a function call approach. BTW, if the application has to anyway
know the LSN till where replica needs to wait, why can't they simply
monitor the pg_last_wal_replay_lsn() value?

-- 
With Regards,
Amit Kapila.

Re: [HACKERS] make async slave to wait for lsn to be replayed

2024-03-18 Thread Alexander Korotkov

On Mon, Mar 18, 2024 at 5:17 AM Amit Kapila  wrote:
> On Sun, Mar 17, 2024 at 7:40 PM Alexander Korotkov 
wrote:
> > On Sat, Mar 16, 2024 at 5:05 PM Bharath Rupireddy
> >  wrote:
> > > On Sat, Mar 16, 2024 at 4:26 PM Amit Kapila 
wrote:
> > > > In general, it seems this patch has been stuck for a long time on
the
> > > > decision to choose an appropriate UI (syntax), and we thought of
> > > > moving it further so that the other parts of the patch can be
> > > > reviewed/discussed. So, I feel before pushing this we should see
> > > > comments from a few (at least two) other senior members who earlier
> > > > shared their opinion on the syntax. I know we don't have much time
> > > > left but OTOH pushing such a change (where we didn't have a
consensus
> > > > on syntax) without much discussion at this point of time could lead
to
> > > > discussions after commit.
> > >
> > > +1 to gain consensus first on the syntax changes. With this, we might
> > > be violating the SQL standard for explicit txn commands (I stand for
> > > correction about the SQL standard though).
> >
> > Thank you for your feedback.  Generally, I agree it's correct to get
> > consensus on syntax first.  And yes, this patch has been here since
> > 2016.  We didn't get consensus for syntax for 8 years.  Frankly
> > speaking, I don't see a reason why this shouldn't take another 8
> > years.  At the same time the ability to wait on standby given LSN is
> > replayed seems like pretty basic and simple functionality.  Thus, it's
> > quite frustrating it already took that long and still unclear when/how
> > this could be finished.
> >
> > My current attempt was to commit minimal implementation as less
> > invasive as possible.  A new clause for BEGIN doesn't require
> > additional keywords and doesn't introduce additional statements.  But
> > yes, this is still a new qual.  And, yes, Amit you're right that even
> > if I had committed that, there was still a high risk of further
> > debates and revert.
> >
> > Given my specsis about agreement over syntax, I'd like to check
> > another time if we could go without new syntax at all.  There was an
> > attempt to implement waiting for lsn as a function.  But function
> > holds a snapshot, which could prevent WAL records from being replayed.
> > Releasing a snapshot could break the parent query.  But now we have
> > procedures, which need a dedicated statement for the call and can even
> > control transactions.  Could we implement a waitlsn in a procedure
> > that:
> >
> > 1. First, check that it was called with non-atomic context (that is,
> > it's not called within a transaction). Trigger error if called with
> > atomic context.
> > 2. Release a snapshot to be able to wait without risk of WAL replay
> > stuck.  Procedure is still called within the snapshot.  It's a bit of
> > a hack to release a snapshot, but Vacuum statements already do so.
> >
>
> Can you please provide a bit more details with some example what is
> the existing problem with functions and how using procedures will
> resolve it? How will this this address the implicit transaction case
> or do we have any other workaround for those cases?

Please check [1] and [2] for the explanation of the problem with functions.

Also, please find a draft patch implementing the procedure.  The issue with
the snapshot is addressed with the following lines.

We first ensure we're in a non-atomic context, then pop an active snapshot
(tricky, but ExecuteVacuum() does the same).  Then we should have no active
snapshot and it's safe to wait for lsn replay.

if (context->atomic)
ereport(ERROR,
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
 errmsg("pg_wait_lsn() must be only called in non-atomic
context")));

if (ActiveSnapshotSet())
PopActiveSnapshot();
Assert(!ActiveSnapshotSet());

The function call could be added either before the BEGIN statement or
before the implicit transaction.

CALL pg_wait_lsn('my_lsn', my_timeout); BEGIN;
CALL pg_wait_lsn('my_lsn', my_timeout); SELECT ...;

Links
1.
https://www.postgresql.org/message-id/CAPpHfduBSN8j5j5Ynn5x%3DThD%3D8ypNd53D608VXGweBsPzxPvqA%40mail.gmail.com
2.
https://www.postgresql.org/message-id/CAPpHfdtiGgn0iS1KbW2HTam-1%2BoK%2BvhXZDAcnX9hKaA7Oe%3DF-A%40mail.gmail.com

--
Regards,
Alexander Korotkov


v11-0001-Implement-AFTER-clause-for-BEGIN-command.patch
Description: Binary data

Re: [HACKERS] make async slave to wait for lsn to be replayed

2024-03-17 Thread Amit Kapila

On Sun, Mar 17, 2024 at 7:40 PM Alexander Korotkov  wrote:
>
> On Sat, Mar 16, 2024 at 5:05 PM Bharath Rupireddy
>  wrote:
> > On Sat, Mar 16, 2024 at 4:26 PM Amit Kapila  wrote:
> > > In general, it seems this patch has been stuck for a long time on the
> > > decision to choose an appropriate UI (syntax), and we thought of
> > > moving it further so that the other parts of the patch can be
> > > reviewed/discussed. So, I feel before pushing this we should see
> > > comments from a few (at least two) other senior members who earlier
> > > shared their opinion on the syntax. I know we don't have much time
> > > left but OTOH pushing such a change (where we didn't have a consensus
> > > on syntax) without much discussion at this point of time could lead to
> > > discussions after commit.
> >
> > +1 to gain consensus first on the syntax changes. With this, we might
> > be violating the SQL standard for explicit txn commands (I stand for
> > correction about the SQL standard though).
>
> Thank you for your feedback.  Generally, I agree it's correct to get
> consensus on syntax first.  And yes, this patch has been here since
> 2016.  We didn't get consensus for syntax for 8 years.  Frankly
> speaking, I don't see a reason why this shouldn't take another 8
> years.  At the same time the ability to wait on standby given LSN is
> replayed seems like pretty basic and simple functionality.  Thus, it's
> quite frustrating it already took that long and still unclear when/how
> this could be finished.
>
> My current attempt was to commit minimal implementation as less
> invasive as possible.  A new clause for BEGIN doesn't require
> additional keywords and doesn't introduce additional statements.  But
> yes, this is still a new qual.  And, yes, Amit you're right that even
> if I had committed that, there was still a high risk of further
> debates and revert.
>
> Given my specsis about agreement over syntax, I'd like to check
> another time if we could go without new syntax at all.  There was an
> attempt to implement waiting for lsn as a function.  But function
> holds a snapshot, which could prevent WAL records from being replayed.
> Releasing a snapshot could break the parent query.  But now we have
> procedures, which need a dedicated statement for the call and can even
> control transactions.  Could we implement a waitlsn in a procedure
> that:
>
> 1. First, check that it was called with non-atomic context (that is,
> it's not called within a transaction). Trigger error if called with
> atomic context.
> 2. Release a snapshot to be able to wait without risk of WAL replay
> stuck.  Procedure is still called within the snapshot.  It's a bit of
> a hack to release a snapshot, but Vacuum statements already do so.
>

Can you please provide a bit more details with some example what is
the existing problem with functions and how using procedures will
resolve it? How will this this address the implicit transaction case
or do we have any other workaround for those cases?

-- 
With Regards,
Amit Kapila.

Re: [HACKERS] make async slave to wait for lsn to be replayed

2024-03-17 Thread Alexander Korotkov

Hi Amit,
Hi Bharath,

On Sat, Mar 16, 2024 at 5:05 PM Bharath Rupireddy
 wrote:
> On Sat, Mar 16, 2024 at 4:26 PM Amit Kapila  wrote:
> > In general, it seems this patch has been stuck for a long time on the
> > decision to choose an appropriate UI (syntax), and we thought of
> > moving it further so that the other parts of the patch can be
> > reviewed/discussed. So, I feel before pushing this we should see
> > comments from a few (at least two) other senior members who earlier
> > shared their opinion on the syntax. I know we don't have much time
> > left but OTOH pushing such a change (where we didn't have a consensus
> > on syntax) without much discussion at this point of time could lead to
> > discussions after commit.
>
> +1 to gain consensus first on the syntax changes. With this, we might
> be violating the SQL standard for explicit txn commands (I stand for
> correction about the SQL standard though).

Thank you for your feedback.  Generally, I agree it's correct to get
consensus on syntax first.  And yes, this patch has been here since
2016.  We didn't get consensus for syntax for 8 years.  Frankly
speaking, I don't see a reason why this shouldn't take another 8
years.  At the same time the ability to wait on standby given LSN is
replayed seems like pretty basic and simple functionality.  Thus, it's
quite frustrating it already took that long and still unclear when/how
this could be finished.

My current attempt was to commit minimal implementation as less
invasive as possible.  A new clause for BEGIN doesn't require
additional keywords and doesn't introduce additional statements.  But
yes, this is still a new qual.  And, yes, Amit you're right that even
if I had committed that, there was still a high risk of further
debates and revert.

Given my specsis about agreement over syntax, I'd like to check
another time if we could go without new syntax at all.  There was an
attempt to implement waiting for lsn as a function.  But function
holds a snapshot, which could prevent WAL records from being replayed.
Releasing a snapshot could break the parent query.  But now we have
procedures, which need a dedicated statement for the call and can even
control transactions.  Could we implement a waitlsn in a procedure
that:

1. First, check that it was called with non-atomic context (that is,
it's not called within a transaction). Trigger error if called with
atomic context.
2. Release a snapshot to be able to wait without risk of WAL replay
stuck.  Procedure is still called within the snapshot.  It's a bit of
a hack to release a snapshot, but Vacuum statements already do so.

Amit, Bharath, what do you think about this approach?  Is this a way to go?

--
Regards,
Alexander Korotkov

Re: [HACKERS] make async slave to wait for lsn to be replayed

2024-03-16 Thread Bharath Rupireddy

On Sat, Mar 16, 2024 at 4:26 PM Amit Kapila  wrote:
>
> On Fri, Mar 15, 2024 at 7:50 PM Alexander Korotkov  
> wrote:
> >
> > I went through this patch another time, and made some minor
> > adjustments.  Now it looks good, I'm going to push it if no
> > objections.
> >
>
> I have a question related to usability, if the regular reads (say a
> Select statement or reads via function/procedure) need a similar
> guarantee to see the changes on standby then do they also always need
> to first do something like "BEGIN AFTER '0/3F0FF791' WITHIN 1000;"? Or
> in other words, shouldn't we think of something for implicit
> transactions?

+1 to have support for implicit txns. A strawman solution I can think
of is to let primary send its current insert LSN to the standby every
time it sends a bunch of WAL, and the standby waits for that LSN to be
replayed on it at the start of every implicit txn automatically.

The new BEGIN syntax requires application code changes. This led me to
think how one can achieve read-after-write consistency today in a
primary - standby set up. All the logic of this patch, that is,
waiting for the standby to pass a given primary LSN needs to be done
in the application code (or in proxy or in load balancer?). I believe
there might be someone doing this already, it's good to hear from
them.

> In general, it seems this patch has been stuck for a long time on the
> decision to choose an appropriate UI (syntax), and we thought of
> moving it further so that the other parts of the patch can be
> reviewed/discussed. So, I feel before pushing this we should see
> comments from a few (at least two) other senior members who earlier
> shared their opinion on the syntax. I know we don't have much time
> left but OTOH pushing such a change (where we didn't have a consensus
> on syntax) without much discussion at this point of time could lead to
> discussions after commit.

+1 to gain consensus first on the syntax changes. With this, we might
be violating the SQL standard for explicit txn commands (I stand for
correction about the SQL standard though).

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: [HACKERS] make async slave to wait for lsn to be replayed

2024-03-16 Thread Amit Kapila

On Fri, Mar 15, 2024 at 7:50 PM Alexander Korotkov  wrote:
>
> I went through this patch another time, and made some minor
> adjustments.  Now it looks good, I'm going to push it if no
> objections.
>

I have a question related to usability, if the regular reads (say a
Select statement or reads via function/procedure) need a similar
guarantee to see the changes on standby then do they also always need
to first do something like "BEGIN AFTER '0/3F0FF791' WITHIN 1000;"? Or
in other words, shouldn't we think of something for implicit
transactions?

In general, it seems this patch has been stuck for a long time on the
decision to choose an appropriate UI (syntax), and we thought of
moving it further so that the other parts of the patch can be
reviewed/discussed. So, I feel before pushing this we should see
comments from a few (at least two) other senior members who earlier
shared their opinion on the syntax. I know we don't have much time
left but OTOH pushing such a change (where we didn't have a consensus
on syntax) without much discussion at this point of time could lead to
discussions after commit.

-- 
With Regards,
Amit Kapila.

Re: [HACKERS] make async slave to wait for lsn to be replayed

2024-03-16 Thread Bharath Rupireddy

On Sat, Mar 16, 2024 at 2:04 AM Alexander Korotkov wrote:
>
> > Rebase and update patch.

Thanks for working on this. I took a quick look at v11 patch. Here are
some comments:

1.
+#include "utils/timestamp.h"
+#include "executor/spi.h"
+#include "utils/fmgrprotos.h"

Please place executor/spi.h in the alphabetical order. Also, look at
all other header files and place them in the order.

2. It seems like pgindent is not happy with
src/backend/access/transam/xlogrecovery.c and
src/backend/commands/waitlsn.c. Please run it to keep BF member koel
happy post commit.

3. This patch changes, SQL explicit transaction statement syntax, is
it (this deviation) okay from SQL standard perspective?

4. I think some more checks are needed for better input validations.

4.1 With invalid LSN succeeds, shouldn't it error out? Or at least,
add a fast path/quick exit to WaitForLSN()?
BEGIN AFTER '0/0';

4.2 With an unreasonably high future LSN, BEGIN command waits
unboundedly, shouldn't we check if the specified LSN is more than
pg_last_wal_receive_lsn() error out?
BEGIN AFTER '0/';
SELECT pg_last_wal_receive_lsn() + 1 AS future_receive_lsn \gset
BEGIN AFTER :'future_receive_lsn';

4.3 With an unreasonably high wait time, BEGIN command waits
unboundedly, shouldn't we restrict the wait time to some max value,
say a day or so?
SELECT pg_last_wal_receive_lsn() + 1 AS future_receive_lsn \gset
BEGIN AFTER :'future_receive_lsn' WITHIN 10;

5.
+#include
+#include
+#include "postgres.h"
+#include "pgstat.h"

postgres.h must be included at the first, and then the system header
files, and then all postgres header files, just like below. See a very
recent commit
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=97d85be365443eb4bf84373a7468624762382059.

+#include "postgres.h"

+#include
+#include

+#include "access/transam.h"
+#include "access/xact.h"
+#include "access/xlog.h"

6.
+/* Set all latches in shared memory to signal that new LSN has been replayed */
+void
+WaitLSNSetLatches(XLogRecPtr curLSN)
+{

I see this patch is waking up all the waiters in the recovery path
after applying every WAL record, which IMO is a hot path. Is the
impact of this change on recovery measured, perhaps using
https://github.com/macdice/redo-bench or similar tools?

7. In continuation to comment #6, why not use Conditional Variables
instead of proc latches to sleep and wait for all the waiters in
WaitLSNSetLatches?

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: [HACKERS] make async slave to wait for lsn to be replayed

2024-03-15 Thread Alexander Korotkov

On Fri, Mar 15, 2024 at 10:32 PM Kartyshov Ivan
 wrote:
>
> On 2024-03-15 22:59, Kartyshov Ivan wrote:
> > On 2024-03-11 13:44, Alexander Korotkov wrote:
> >> I picked the second option and left only the AFTER clause for the
> >> BEGIN statement.  I think this should be enough for the beginning.
> >
> > Thank you for your rework on your patch, here I made some fixes:
> > 0) autocomplete
> > 1) less jumps
> > 2) more description and add cases in doc

Thank you!

> > I think, it will be useful to have stand-alone statement.
> > Why you would like to see only AFTER clause for the BEGIN statement?

Yes, stand-alone statements might be also useful.  But I think that
the best way for this feature to get into the core would be to commit
the minimal version first.  The BEGIN clause has minimal invasiveness
for the syntax and I believe covers most typical use-cases.  Once we
figure out it's OK and have positive feedback from users, we can do
more enchantments incrementally.

> Rebase and update patch.

Cool, I was just about to ask you to do this.

 --
Regards,
Alexander Korotkov

Re: [HACKERS] make async slave to wait for lsn to be replayed

2024-03-15 Thread Kartyshov Ivan


On 2024-03-15 22:59, Kartyshov Ivan wrote:

On 2024-03-11 13:44, Alexander Korotkov wrote:

I picked the second option and left only the AFTER clause for the
BEGIN statement.  I think this should be enough for the beginning.


Thank you for your rework on your patch, here I made some fixes:
0) autocomplete
1) less jumps
2) more description and add cases in doc

I think, it will be useful to have stand-alone statement.
Why you would like to see only AFTER clause for the BEGIN statement?


Rebase and update patch.

--
Ivan Kartyshov
Postgres Professional: www.postgrespro.comdiff --git a/doc/src/sgml/ref/begin.sgml b/doc/src/sgml/ref/begin.sgml
index 016b021487..759e46ec24 100644
--- a/doc/src/sgml/ref/begin.sgml
+++ b/doc/src/sgml/ref/begin.sgml
@@ -21,13 +21,16 @@ PostgreSQL documentation
 
  
 
-BEGIN [ WORK | TRANSACTION ] [ transaction_mode [, ...] ]
+BEGIN [ WORK | TRANSACTION ] [ transaction_mode [, ...] ] wait_event
 
 where transaction_mode is one of:
 
 ISOLATION LEVEL { SERIALIZABLE | REPEATABLE READ | READ COMMITTED | READ UNCOMMITTED }
 READ WRITE | READ ONLY
 [ NOT ] DEFERRABLE
+
+where wait_event is:
+AFTER lsn_value [ WITHIN number_of_milliseconds ]
 
  
 
@@ -78,6 +81,50 @@ BEGIN [ WORK | TRANSACTION ] [ transaction_mode
 

+
+   
+AFTER lsn_value
+
+ 
+   AFTER clause is used on standby in
+   physical streaming replication
+   and specifies waiting for the specific WAL location (LSN)
+   to be replayed before starting the transaction.
+ 
+ 
+   This option is useful when the user makes some data changes on primary
+   and needs a guarantee to see these changes on standby.  The LSN to wait
+   could be obtained on the primary using
+   pg_current_wal_insert_lsn
+   function after committing the relevant changes.
+ 
+
+   
+
+   
+WITHIN number_of_milliseconds
+
+ 
+   Provides the timeout for the AFTER clause.
+   Especially helpful to prevent freezing on streaming replication
+   connection failures.
+ 
+ 
+   On practice we  have three variants of waiting current LSN:
+  waiting forever - (finish on PM DEATH or SIGINT)
+ BEGIN AFTER lsn;
+wait for changes to be replayed on standby and then start transaction
+  waiting on timeout -
+ BEGIN AFTER lsn WITHIN 6;
+wait changes for 60 seconds and then cancel to start transaction
+to perevent freezing on streaming replication connection failures.
+  no wait, just check -
+ BEGIN AFTER lsn WITHIN 1;
+some time it is useful just check if changes was replayed
+   where number_of_milliseconds can:
+ 
+
+   
   
 
   
@@ -123,6 +170,33 @@ BEGIN [ WORK | TRANSACTION ] [ transaction_mode
 BEGIN;
 
+
+  
+   To begin a transaction block after replaying the given LSN.
+   The command will be canceled if the given LSN is not
+   reached within the timeout of one second.
+
+BEGIN AFTER '0/3F05F791' WITHIN 1000;
+BEGIN
+
+
+  
+   This way we just check (without waiting) if LSN was replayed,
+   and if it was then start transaction.
+
+BEGIN AFTER '0/3F0FF791' WITHIN 1;
+ERROR:  canceling waiting for LSN due to timeout
+
+
+  
+   To begin a transaction block after replaying the given LSN.
+   The command will be canceled only on user request or postmaster death.
+
+BEGIN AFTER '0/3F0FF791';
+^CCancel request sent
+ERROR:  canceling statement due to user request
+
+
  
 
  
diff --git a/doc/src/sgml/ref/start_transaction.sgml b/doc/src/sgml/ref/start_transaction.sgml
index 74ccd7e345..46a3bcf1a8 100644
--- a/doc/src/sgml/ref/start_transaction.sgml
+++ b/doc/src/sgml/ref/start_transaction.sgml
@@ -21,13 +21,16 @@ PostgreSQL documentation
 
  
 
-START TRANSACTION [ transaction_mode [, ...] ]
+START TRANSACTION [ transaction_mode [, ...] ] wait_event
 
 where transaction_mode is one of:
 
 ISOLATION LEVEL { SERIALIZABLE | REPEATABLE READ | READ COMMITTED | READ UNCOMMITTED }
 READ WRITE | READ ONLY
 [ NOT ] DEFERRABLE
+
+where wait_event is:
+AFTER lsn_value [ WITHIN number_of_milliseconds ]
 
  
 
diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c
index 29c5bec084..89b997ad47 100644
--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -43,6 +43,7 @@
 #include "backup/basebackup.h"
 #include "catalog/pg_control.h"
 #include "commands/tablespace.h"
+#include "commands/waitlsn.h"
 #include "common/file_utils.h"
 #include "miscadmin.h"
 #include "pgstat.h"
@@ -1828,6 +1829,14 @@ PerformWalRecovery(void)
 break;
 			}
 
+			/*
+			 * If we replayed an LSN that someone was waiting for, set latches
+			 * in shared memory array to notify the waiter.
+			 */
+			if (waitLSN &&
+	(XLogRecoveryCtl->lastReplayedEndRecPtr >= pg_atomic_read_u64(>minLSN)))
+

Re: [HACKERS] make async slave to wait for lsn to be replayed

2024-03-15 Thread Kartyshov Ivan


On 2024-03-11 13:44, Alexander Korotkov wrote:

I picked the second option and left only the AFTER clause for the
BEGIN statement.  I think this should be enough for the beginning.


Thank you for your rework on your patch, here I made some fixes:
0) autocomplete
1) less jumps
2) more description and add cases in doc

I think, it will be useful to have stand-alone statement.
Why you would like to see only AFTER clause for the BEGIN statement?

--
Ivan Kartyshov
Postgres Professional: www.postgrespro.comdiff --git a/doc/src/sgml/ref/begin.sgml b/doc/src/sgml/ref/begin.sgml
index 016b021487..759e46ec24 100644
--- a/doc/src/sgml/ref/begin.sgml
+++ b/doc/src/sgml/ref/begin.sgml
@@ -21,13 +21,16 @@ PostgreSQL documentation
 
  
 
-BEGIN [ WORK | TRANSACTION ] [ transaction_mode [, ...] ]
+BEGIN [ WORK | TRANSACTION ] [ transaction_mode [, ...] ] wait_event
 
 where transaction_mode is one of:
 
 ISOLATION LEVEL { SERIALIZABLE | REPEATABLE READ | READ COMMITTED | READ UNCOMMITTED }
 READ WRITE | READ ONLY
 [ NOT ] DEFERRABLE
+
+where wait_event is:
+AFTER lsn_value [ WITHIN number_of_milliseconds ]
 
  
 
@@ -78,6 +81,50 @@ BEGIN [ WORK | TRANSACTION ] [ transaction_mode
 

+
+   
+AFTER lsn_value
+
+ 
+   AFTER clause is used on standby in
+   physical streaming replication
+   and specifies waiting for the specific WAL location (LSN)
+   to be replayed before starting the transaction.
+ 
+ 
+   This option is useful when the user makes some data changes on primary
+   and needs a guarantee to see these changes on standby.  The LSN to wait
+   could be obtained on the primary using
+   pg_current_wal_insert_lsn
+   function after committing the relevant changes.
+ 
+
+   
+
+   
+WITHIN number_of_milliseconds
+
+ 
+   Provides the timeout for the AFTER clause.
+   Especially helpful to prevent freezing on streaming replication
+   connection failures.
+ 
+ 
+   On practice we  have three variants of waiting current LSN:
+  waiting forever - (finish on PM DEATH or SIGINT)
+ BEGIN AFTER lsn;
+wait for changes to be replayed on standby and then start transaction
+  waiting on timeout -
+ BEGIN AFTER lsn WITHIN 6;
+wait changes for 60 seconds and then cancel to start transaction
+to perevent freezing on streaming replication connection failures.
+  no wait, just check -
+ BEGIN AFTER lsn WITHIN 1;
+some time it is useful just check if changes was replayed
+   where number_of_milliseconds can:
+ 
+
+   
   
 
   
@@ -123,6 +170,33 @@ BEGIN [ WORK | TRANSACTION ] [ transaction_mode
 BEGIN;
 
+
+  
+   To begin a transaction block after replaying the given LSN.
+   The command will be canceled if the given LSN is not
+   reached within the timeout of one second.
+
+BEGIN AFTER '0/3F05F791' WITHIN 1000;
+BEGIN
+
+
+  
+   This way we just check (without waiting) if LSN was replayed,
+   and if it was then start transaction.
+
+BEGIN AFTER '0/3F0FF791' WITHIN 1;
+ERROR:  canceling waiting for LSN due to timeout
+
+
+  
+   To begin a transaction block after replaying the given LSN.
+   The command will be canceled only on user request or postmaster death.
+
+BEGIN AFTER '0/3F0FF791';
+^CCancel request sent
+ERROR:  canceling statement due to user request
+
+
  
 
  
diff --git a/doc/src/sgml/ref/start_transaction.sgml b/doc/src/sgml/ref/start_transaction.sgml
index 74ccd7e345..46a3bcf1a8 100644
--- a/doc/src/sgml/ref/start_transaction.sgml
+++ b/doc/src/sgml/ref/start_transaction.sgml
@@ -21,13 +21,16 @@ PostgreSQL documentation
 
  
 
-START TRANSACTION [ transaction_mode [, ...] ]
+START TRANSACTION [ transaction_mode [, ...] ] wait_event
 
 where transaction_mode is one of:
 
 ISOLATION LEVEL { SERIALIZABLE | REPEATABLE READ | READ COMMITTED | READ UNCOMMITTED }
 READ WRITE | READ ONLY
 [ NOT ] DEFERRABLE
+
+where wait_event is:
+AFTER lsn_value [ WITHIN number_of_milliseconds ]
 
  
 
diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c
index 29c5bec084..89b997ad47 100644
--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -43,6 +43,7 @@
 #include "backup/basebackup.h"
 #include "catalog/pg_control.h"
 #include "commands/tablespace.h"
+#include "commands/waitlsn.h"
 #include "common/file_utils.h"
 #include "miscadmin.h"
 #include "pgstat.h"
@@ -1828,6 +1829,14 @@ PerformWalRecovery(void)
 break;
 			}
 
+			/*
+			 * If we replayed an LSN that someone was waiting for, set latches
+			 * in shared memory array to notify the waiter.
+			 */
+			if (waitLSN &&
+	(XLogRecoveryCtl->lastReplayedEndRecPtr >= pg_atomic_read_u64(>minLSN)))
+WaitLSNSetLatches(XLogRecoveryCtl->lastReplayedEndRecPtr);
+
 			/* Else, try to fetch the next WAL record

Re: [HACKERS] make async slave to wait for lsn to be replayed

2024-03-15 Thread Alexander Korotkov

On Fri, Mar 15, 2024 at 4:20 PM Alexander Korotkov 
wrote:

> On Mon, Mar 11, 2024 at 12:44 PM Alexander Korotkov
>  wrote:
> > I've decided to put my hands on this patch.
> >
> > On Thu, Mar 7, 2024 at 2:25 PM Amit Kapila 
> wrote:
> > > +1 for the second one not only because it avoids new words in grammar
> > > but also sounds to convey the meaning. I think you can explain in docs
> > > how this feature can be used basically how will one get the correct
> > > LSN value to specify.
> >
> > I picked the second option and left only the AFTER clause for the
> > BEGIN statement.  I think this should be enough for the beginning.
> >
> > > As suggested previously also pick one of the approaches (I would
> > > advocate the second one) and keep an option for the second one by
> > > mentioning it in the commit message. I hope to see more
> > > reviews/discussions or usage like how will users get the LSN value to
> > > be specified on the core logic of the feature at this stage. IF
> > > possible, state, how real-world applications could leverage this
> > > feature.
> >
> > I've added a paragraph to the docs about the usage.  After you made
> > some changes on primary, you run pg_current_wal_insert_lsn().  Then
> > connect to replica and run 'BEGIN AFTER lsn' with the just obtained
> > LSN.  Now you're guaranteed to see the changes made to the primary.
> >
> > Also, I've significantly reworked other aspects of the patch.  The
> > most significant changes are:
> > 1) Waiters are now stored in the array sorted by LSN.  This saves us
> > from scanning of wholeper-backend array.
> > 2) Waiters are removed from the array immediately once their LSNs are
> > replayed.  Otherwise, the WAL replayer will keep scanning the shared
> > memory array till waiters wake up.
> > 3) To clean up after errors, we now call WaitLSNCleanup() on backend
> > shmem exit.  I think this is preferable over the previous approach to
> > remove from the queue before ProcessInterrupts().
> > 4) There is now condition to recheck if LSN is replayed after adding
> > to the shared memory array.  This should save from the race
> > conditions.
> > 5) I've renamed too generic names for functions and files.
>
> I went through this patch another time, and made some minor
> adjustments.  Now it looks good, I'm going to push it if no
> objections.
>

The revised patch version with cosmetic fixes proposed by Alexander Lakhin.

--
Regards,
Alexander Korotkov


v10-0001-Implement-AFTER-clause-for-BEGIN-command.patch
Description: Binary data

Re: [HACKERS] make async slave to wait for lsn to be replayed

2024-03-15 Thread Alexander Korotkov

On Mon, Mar 11, 2024 at 12:44 PM Alexander Korotkov
 wrote:
> I've decided to put my hands on this patch.
>
> On Thu, Mar 7, 2024 at 2:25 PM Amit Kapila  wrote:
> > +1 for the second one not only because it avoids new words in grammar
> > but also sounds to convey the meaning. I think you can explain in docs
> > how this feature can be used basically how will one get the correct
> > LSN value to specify.
>
> I picked the second option and left only the AFTER clause for the
> BEGIN statement.  I think this should be enough for the beginning.
>
> > As suggested previously also pick one of the approaches (I would
> > advocate the second one) and keep an option for the second one by
> > mentioning it in the commit message. I hope to see more
> > reviews/discussions or usage like how will users get the LSN value to
> > be specified on the core logic of the feature at this stage. IF
> > possible, state, how real-world applications could leverage this
> > feature.
>
> I've added a paragraph to the docs about the usage.  After you made
> some changes on primary, you run pg_current_wal_insert_lsn().  Then
> connect to replica and run 'BEGIN AFTER lsn' with the just obtained
> LSN.  Now you're guaranteed to see the changes made to the primary.
>
> Also, I've significantly reworked other aspects of the patch.  The
> most significant changes are:
> 1) Waiters are now stored in the array sorted by LSN.  This saves us
> from scanning of wholeper-backend array.
> 2) Waiters are removed from the array immediately once their LSNs are
> replayed.  Otherwise, the WAL replayer will keep scanning the shared
> memory array till waiters wake up.
> 3) To clean up after errors, we now call WaitLSNCleanup() on backend
> shmem exit.  I think this is preferable over the previous approach to
> remove from the queue before ProcessInterrupts().
> 4) There is now condition to recheck if LSN is replayed after adding
> to the shared memory array.  This should save from the race
> conditions.
> 5) I've renamed too generic names for functions and files.

I went through this patch another time, and made some minor
adjustments.  Now it looks good, I'm going to push it if no
objections.

--
Regards,
Alexander Korotkov


v9-0001-Implement-AFTER-clause-for-BEGIN-command.patch
Description: Binary data

Re: [HACKERS] make async slave to wait for lsn to be replayed

2024-03-11 Thread Alexander Korotkov

Hi!

I've decided to put my hands on this patch.

On Thu, Mar 7, 2024 at 2:25 PM Amit Kapila  wrote:
> +1 for the second one not only because it avoids new words in grammar
> but also sounds to convey the meaning. I think you can explain in docs
> how this feature can be used basically how will one get the correct
> LSN value to specify.

I picked the second option and left only the AFTER clause for the
BEGIN statement.  I think this should be enough for the beginning.

> As suggested previously also pick one of the approaches (I would
> advocate the second one) and keep an option for the second one by
> mentioning it in the commit message. I hope to see more
> reviews/discussions or usage like how will users get the LSN value to
> be specified on the core logic of the feature at this stage. IF
> possible, state, how real-world applications could leverage this
> feature.

I've added a paragraph to the docs about the usage.  After you made
some changes on primary, you run pg_current_wal_insert_lsn().  Then
connect to replica and run 'BEGIN AFTER lsn' with the just obtained
LSN.  Now you're guaranteed to see the changes made to the primary.

Also, I've significantly reworked other aspects of the patch.  The
most significant changes are:
1) Waiters are now stored in the array sorted by LSN.  This saves us
from scanning of wholeper-backend array.
2) Waiters are removed from the array immediately once their LSNs are
replayed.  Otherwise, the WAL replayer will keep scanning the shared
memory array till waiters wake up.
3) To clean up after errors, we now call WaitLSNCleanup() on backend
shmem exit.  I think this is preferable over the previous approach to
remove from the queue before ProcessInterrupts().
4) There is now condition to recheck if LSN is replayed after adding
to the shared memory array.  This should save from the race
conditions.
5) I've renamed too generic names for functions and files.

--
Regards,
Alexander Korotkov

v8-0001-Implement-AFTER-clause-for-BEGIN-command.patch
Description: Binary data

Re: [HACKERS] make async slave to wait for lsn to be replayed

2024-03-07 Thread Amit Kapila

On Thu, Mar 7, 2024 at 5:14 PM Kartyshov Ivan
 wrote:
>
> Intro
> ==
> The main purpose of the feature is to achieve
> read-your-writes-consistency, while using async replica for reads and
> primary for writes. In that case lsn of last modification is stored
> inside application. We cannot store this lsn inside database, since
> reads are distributed across all replicas and primary.
>
>
> Two implementations of one feature
> ==
> We left two different solutions. Help me please to choose the best.
>
>
> 1) Classic (wait_classic_v7.patch)
> https://www.postgresql.org/message-id/3cc883048264c2e9af022033925ff8db%40postgrespro.ru
> Synopsis
> ==
> advantages: multiple wait events, separate WAIT FOR statement
> disadvantages: new words in grammar
>
>
>
> WAIT FOR  [ANY | ALL] event [, ...]
> BEGIN [ WORK | TRANSACTION ] [ transaction_mode [, ...] ]
>  [ WAIT FOR [ANY | ALL] event [, ...]]
> event:
> LSN value
> TIMEOUT number_of_milliseconds
> timestamp
>
>
>
> 2) After style: Kyotaro and Freund (wait_after_within_v6.patch)
> https://www.postgresql.org/message-id/d3ff2e363af60b345f82396992595a03%40postgrespro.ru
> Synopsis
> ==
> advantages: no new words in grammar
> disadvantages: a little harder to understand, fewer events to wait
>
>
>
> AFTER lsn_event [ WITHIN delay_milliseconds ] [, ...]
> BEGIN [ WORK | TRANSACTION ] [ transaction_mode [, ...] ]
>  [ AFTER lsn_event [ WITHIN delay_milliseconds ]]
> START [ WORK | TRANSACTION ] [ transaction_mode [, ...] ]
>  [ AFTER lsn_event [ WITHIN delay_milliseconds ]]
>

+1 for the second one not only because it avoids new words in grammar
but also sounds to convey the meaning. I think you can explain in docs
how this feature can be used basically how will one get the correct
LSN value to specify.

As suggested previously also pick one of the approaches (I would
advocate the second one) and keep an option for the second one by
mentioning it in the commit message. I hope to see more
reviews/discussions or usage like how will users get the LSN value to
be specified on the core logic of the feature at this stage. IF
possible, state, how real-world applications could leverage this
feature.

-- 
With Regards,
Amit Kapila.

Re: [HACKERS] make async slave to wait for lsn to be replayed

2024-03-07 Thread Kartyshov Ivan


Intro
==
The main purpose of the feature is to achieve
read-your-writes-consistency, while using async replica for reads and
primary for writes. In that case lsn of last modification is stored
inside application. We cannot store this lsn inside database, since
reads are distributed across all replicas and primary.


Two implementations of one feature
==
We left two different solutions. Help me please to choose the best.


1) Classic (wait_classic_v7.patch)
https://www.postgresql.org/message-id/3cc883048264c2e9af022033925ff8db%40postgrespro.ru
Synopsis
==
advantages: multiple wait events, separate WAIT FOR statement
disadvantages: new words in grammar



WAIT FOR  [ANY | ALL] event [, ...]
BEGIN [ WORK | TRANSACTION ] [ transaction_mode [, ...] ]
[ WAIT FOR [ANY | ALL] event [, ...]]
event:
LSN value
TIMEOUT number_of_milliseconds
timestamp



2) After style: Kyotaro and Freund (wait_after_within_v6.patch)
https://www.postgresql.org/message-id/d3ff2e363af60b345f82396992595a03%40postgrespro.ru
Synopsis
==
advantages: no new words in grammar
disadvantages: a little harder to understand, fewer events to wait



AFTER lsn_event [ WITHIN delay_milliseconds ] [, ...]
BEGIN [ WORK | TRANSACTION ] [ transaction_mode [, ...] ]
[ AFTER lsn_event [ WITHIN delay_milliseconds ]]
START [ WORK | TRANSACTION ] [ transaction_mode [, ...] ]
[ AFTER lsn_event [ WITHIN delay_milliseconds ]]


Examples
==

primary standby
--- 
postgresql.conf
recovery_min_apply_delay = 10s

CREATE TABLE tbl AS SELECT generate_series(1,10) AS a;
INSERT INTO tbl VALUES (generate_series(11, 20));
SELECT pg_current_wal_lsn();

BEGIN WAIT FOR LSN '0/3002AE8';
SELECT * FROM tbl; // read fresh insertions
COMMIT;

Rebased and ready for review.

--
Ivan Kartyshov
Postgres Professional: www.postgrespro.comdiff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml
index 4a42999b18..657a217e27 100644
--- a/doc/src/sgml/ref/allfiles.sgml
+++ b/doc/src/sgml/ref/allfiles.sgml
@@ -188,6 +188,7 @@ Complete list of usable sgml source files in this directory.
 
 
 
+
 
 
 
diff --git a/doc/src/sgml/ref/begin.sgml b/doc/src/sgml/ref/begin.sgml
index 016b021487..b3af16c09f 100644
--- a/doc/src/sgml/ref/begin.sgml
+++ b/doc/src/sgml/ref/begin.sgml
@@ -21,13 +21,21 @@ PostgreSQL documentation
 
  
 
-BEGIN [ WORK | TRANSACTION ] [ transaction_mode [, ...] ]
+BEGIN [ WORK | TRANSACTION ] [ transaction_mode [, ...] ] wait_for_event
 
 where transaction_mode is one of:
 
 ISOLATION LEVEL { SERIALIZABLE | REPEATABLE READ | READ COMMITTED | READ UNCOMMITTED }
 READ WRITE | READ ONLY
 [ NOT ] DEFERRABLE
+
+where wait_for_event is:
+WAIT FOR [ANY | ALL] event [, ...]
+
+and event is one of:
+LSN lsn_value
+TIMEOUT number_of_milliseconds
+timestamp
 
  
 
diff --git a/doc/src/sgml/ref/start_transaction.sgml b/doc/src/sgml/ref/start_transaction.sgml
index 74ccd7e345..1b54ed2084 100644
--- a/doc/src/sgml/ref/start_transaction.sgml
+++ b/doc/src/sgml/ref/start_transaction.sgml
@@ -21,13 +21,21 @@ PostgreSQL documentation
 
  
 
-START TRANSACTION [ transaction_mode [, ...] ]
+START TRANSACTION [ transaction_mode [, ...] ] wait_for_event
 
 where transaction_mode is one of:
 
 ISOLATION LEVEL { SERIALIZABLE | REPEATABLE READ | READ COMMITTED | READ UNCOMMITTED }
 READ WRITE | READ ONLY
 [ NOT ] DEFERRABLE
+
+where wait_for_event is:
+WAIT FOR [ANY | ALL] event [, ...]
+
+and event is one of:
+LSN lsn_value
+TIMEOUT number_of_milliseconds
+timestamp
 
  
 
diff --git a/doc/src/sgml/ref/wait.sgml b/doc/src/sgml/ref/wait.sgml
new file mode 100644
index 00..26cae3ad85
--- /dev/null
+++ b/doc/src/sgml/ref/wait.sgml
@@ -0,0 +1,146 @@
+
+
+
+ 
+  WAIT FOR
+ 
+
+ 
+  WAIT FOR
+  7
+  SQL - Language Statements
+ 
+
+ 
+  WAIT FOR
+  wait for the target LSN to be replayed or for specified time to pass
+ 
+
+ 
+
+WAIT FOR [ANY | ALL] event [, ...]
+
+where event is one of:
+LSN value
+TIMEOUT number_of_milliseconds
+timestamp
+
+WAIT FOR LSN 'lsn_number'
+WAIT FOR LSN 'lsn_number' TIMEOUT wait_timeout
+WAIT FOR LSN 'lsn_number', TIMESTAMP wait_time
+WAIT FOR TIMESTAMP wait_time
+WAIT FOR ALL LSN 'lsn_number' TIMEOUT wait_timeout, TIMESTAMP wait_time
+WAIT FOR ANY LSN 'lsn_number', TIMESTAMP wait_time
+
+ 
+
+ 
+  Description
+
+  
+   WAIT FOR provides a simple interprocess communication
+   mechanism to wait for timestamp, timeout or the target log sequence number
+   (LSN) on standby in PostgreSQL
+   databases with master-standby asynchronous replication. When run with the
+   LSN option, the WAIT FOR
+   command waits for the specified LSN to be replayed.
+   If no timestamp or timeout was specified, wait time is unlimited.
+   Waiting can be interrupted using Ctrl+C, or
+   by shutting down the postgres server.
+  
+
+ 
+
+ 
+  Parameters
+
+  
+   
+

Fwd: Re: [HACKERS] make async slave to wait for lsn to be replayed

2024-02-01 Thread Kartyshov Ivan


Updated, rebased, fixed Ci and added documentation.
We left two different solutions. Help me please to choose the best.

1) Classic (wait_classic_v6.patch)
https://www.postgresql.org/message-id/3cc883048264c2e9af022033925ff8db%40postgrespro.ru
==
advantages: multiple wait events, separate WAIT FOR statement
disadvantages: new words in grammar



WAIT FOR  [ANY | ALL] event [, ...]
BEGIN [ WORK | TRANSACTION ] [ transaction_mode [, ...] ]
[ WAIT FOR [ANY | ALL] event [, ...]]
event:
LSN value
TIMEOUT number_of_milliseconds
timestamp



2) After style: Kyotaro and Freund (wait_after_within_v5.patch)
https://www.postgresql.org/message-id/d3ff2e363af60b345f82396992595a03%40postgrespro.ru
==
advantages: no new words in grammar
disadvantages: a little harder to understand, fewer events to wait



AFTER lsn_event [ WITHIN delay_milliseconds ] [, ...]
BEGIN [ WORK | TRANSACTION ] [ transaction_mode [, ...] ]
[ AFTER lsn_event [ WITHIN delay_milliseconds ]]
START [ WORK | TRANSACTION ] [ transaction_mode [, ...] ]
[ AFTER lsn_event [ WITHIN delay_milliseconds ]]




Regards

--
Ivan Kartyshov
Postgres Professional: www.postgrespro.comdiff --git a/doc/src/sgml/ref/after.sgml b/doc/src/sgml/ref/after.sgml
new file mode 100644
index 00..def0ec6be3
--- /dev/null
+++ b/doc/src/sgml/ref/after.sgml
@@ -0,0 +1,118 @@
+
+
+
+ 
+  AFTER
+ 
+
+ 
+  AFTER
+  7
+  SQL - Language Statements
+ 
+
+ 
+  AFTER
+  AFTER the target LSN to be replayed and for specified time to timeout
+ 
+
+ 
+
+AFTER LSN [WITHIN timeout_in_milliseconds]
+
+AFTER LSN 'lsn_number'
+AFTER LSN 'lsn_number' WITHIN wait_timeout
+
+ 
+
+ 
+  Description
+
+  
+   AFTER provides a simple interprocess communication
+   mechanism to wait for the target log sequence number 
+   (LSN) on standby in PostgreSQL
+   databases with master-standby asynchronous replication. AFTER
+   command waits for the specified LSN to be replayed.
+   If no timeout was specified, wait time is unlimited.
+   Waiting can be interrupted using Ctrl+C, or
+   by shutting down the postgres server.
+  
+
+ 
+
+ 
+  Parameters
+
+  
+   
+LSN
+
+ 
+  Specify the target log sequence number to wait for.
+ 
+
+   
+
+   
+TIMEOUT wait_timeout
+
+ 
+  Limit the time interval to AFTER the LSN to be replayed.
+  The specified wait_timeout must be an integer
+  and is measured in milliseconds.
+ 
+
+   
+
+  
+ 
+
+ 
+  Examples
+
+  
+   Run AFTER from psql,
+   limiting wait time to 1 milliseconds:
+
+
+AFTER '0/3F07A6B1' WITHIN 1;
+NOTICE:  LSN is not reached. Try to increase wait time.
+LSN reached
+-
+ f
+(1 row)
+
+  
+
+  
+   Wait until the specified LSN is replayed:
+
+AFTER '0/3F07A611';
+LSN reached
+-
+ t
+(1 row)
+
+  
+
+  
+   Limit LSN wait time to 50 milliseconds,
+   and then cancel the command if LSN was not reached:
+
+AFTER LSN '0/3F0FF791' WITHIN 50;
+^CCancel request sent
+NOTICE:  LSN is not reached. Try to increase wait time.
+ERROR:  canceling statement due to user request
+ LSN reached
+-
+ f
+(1 row)
+
+
+
+
+
diff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml
index 4a42999b18..ff2e6e0f3f 100644
--- a/doc/src/sgml/ref/allfiles.sgml
+++ b/doc/src/sgml/ref/allfiles.sgml
@@ -6,6 +6,7 @@ Complete list of usable sgml source files in this directory.
 
 
 
+
 
 
 
@@ -188,6 +189,7 @@ Complete list of usable sgml source files in this directory.
 
 
 
+
 
 
 
diff --git a/doc/src/sgml/ref/begin.sgml b/doc/src/sgml/ref/begin.sgml
index 016b021487..a2794763b1 100644
--- a/doc/src/sgml/ref/begin.sgml
+++ b/doc/src/sgml/ref/begin.sgml
@@ -21,13 +21,16 @@ PostgreSQL documentation
 
  
 
-BEGIN [ WORK | TRANSACTION ] [ transaction_mode [, ...] ]
+BEGIN [ WORK | TRANSACTION ] [ transaction_mode [, ...] ] wait_event
 
 where transaction_mode is one of:
 
 ISOLATION LEVEL { SERIALIZABLE | REPEATABLE READ | READ COMMITTED | READ UNCOMMITTED }
 READ WRITE | READ ONLY
 [ NOT ] DEFERRABLE
+
+where wait_event is:
+AFTER lsn_value [ WITHIN number_of_milliseconds ]
 
  
 
diff --git a/doc/src/sgml/ref/start_transaction.sgml b/doc/src/sgml/ref/start_transaction.sgml
index 74ccd7e345..46a3bcf1a8 100644
--- a/doc/src/sgml/ref/start_transaction.sgml
+++ b/doc/src/sgml/ref/start_transaction.sgml
@@ -21,13 +21,16 @@ PostgreSQL documentation
 
  
 
-START TRANSACTION [ transaction_mode [, ...] ]
+START TRANSACTION [ transaction_mode [, ...] ] wait_event
 
 where transaction_mode is one of:
 
 ISOLATION LEVEL { SERIALIZABLE | REPEATABLE READ | READ COMMITTED | READ UNCOMMITTED }
 READ WRITE | READ ONLY
 [ NOT ] DEFERRABLE
+
+where wait_event is:
+AFTER lsn_value [ WITHIN number_of_milliseconds ]
 
  
 
diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml
index aa94f6adf6..7c94cf9de1 100644
--- a/doc/src/sgml/reference.sgml
+++ b/doc/src/sgml/reference.sgml
@@ -34,6 +34,7 @@

Re: [HACKERS] make async slave to wait for lsn to be replayed

2024-01-21 Thread Dilip Kumar

On Wed, Jan 17, 2024 at 1:46 PM Kartyshov Ivan
 wrote:
>
> Add some fixes and rebase.
>
While quickly looking into the patch, I understood the idea of what we
are trying to achieve here and I feel that a useful feature.  But
while looking at both the patches I could not quickly differentiate
between these two approaches.  I believe, internally at the core both
are implementing similar wait logic but providing different syntaxes.
So if we want to keep both these approaches open for the sake of
discussion then better first to create a patch that implements the
core approach i.e. the waiting logic and the other common part and
then add top-up patches with 2 different approaches that would be easy
for review.  I also see in v4 that there is no documentation for the
syntax part so it makes it even harder to understand.

I think this thread is implementing a useful feature so my suggestion
is to add some documentation in v4 and also make it more readable
w.r.t. What are the clear differences between these two approaches,
maybe adding commit message will also help.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] make async slave to wait for lsn to be replayed

2024-01-21 Thread Peter Smith

2024-01 Commitfest.

Hi, This patch has a CF status of "Needs Review" [1], but it seems
there was a CFbot test failure last time it was run [2]. Please have a
look and post an updated version if necessary.

==
[1] https://commitfest.postgresql.org/46/4221/
[2] https://cirrus-ci.com/task/5618308515364864

Kind Regards,
Peter Smith.

Re: [HACKERS] make async slave to wait for lsn to be replayed

2024-01-17 Thread Kartyshov Ivan


Add some fixes and rebase.

--
Ivan Kartyshov
Postgres Professional: www.postgrespro.comdiff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml
index 4a42999b18..657a217e27 100644
--- a/doc/src/sgml/ref/allfiles.sgml
+++ b/doc/src/sgml/ref/allfiles.sgml
@@ -188,6 +188,7 @@ Complete list of usable sgml source files in this directory.
 
 
 
+
 
 
 
diff --git a/doc/src/sgml/ref/begin.sgml b/doc/src/sgml/ref/begin.sgml
index 016b021487..a2794763b1 100644
--- a/doc/src/sgml/ref/begin.sgml
+++ b/doc/src/sgml/ref/begin.sgml
@@ -21,13 +21,16 @@ PostgreSQL documentation
 
  
 
-BEGIN [ WORK | TRANSACTION ] [ transaction_mode [, ...] ]
+BEGIN [ WORK | TRANSACTION ] [ transaction_mode [, ...] ] wait_event
 
 where transaction_mode is one of:
 
 ISOLATION LEVEL { SERIALIZABLE | REPEATABLE READ | READ COMMITTED | READ UNCOMMITTED }
 READ WRITE | READ ONLY
 [ NOT ] DEFERRABLE
+
+where wait_event is:
+AFTER lsn_value [ WITHIN number_of_milliseconds ]
 
  
 
diff --git a/doc/src/sgml/ref/start_transaction.sgml b/doc/src/sgml/ref/start_transaction.sgml
index 74ccd7e345..46a3bcf1a8 100644
--- a/doc/src/sgml/ref/start_transaction.sgml
+++ b/doc/src/sgml/ref/start_transaction.sgml
@@ -21,13 +21,16 @@ PostgreSQL documentation
 
  
 
-START TRANSACTION [ transaction_mode [, ...] ]
+START TRANSACTION [ transaction_mode [, ...] ] wait_event
 
 where transaction_mode is one of:
 
 ISOLATION LEVEL { SERIALIZABLE | REPEATABLE READ | READ COMMITTED | READ UNCOMMITTED }
 READ WRITE | READ ONLY
 [ NOT ] DEFERRABLE
+
+where wait_event is:
+AFTER lsn_value [ WITHIN number_of_milliseconds ]
 
  
 
diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml
index aa94f6adf6..04e17620dd 100644
--- a/doc/src/sgml/reference.sgml
+++ b/doc/src/sgml/reference.sgml
@@ -216,6 +216,7 @@



+   
 
  
 
diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c
index 1b48d7171a..8ab86a83b5 100644
--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -43,6 +43,7 @@
 #include "backup/basebackup.h"
 #include "catalog/pg_control.h"
 #include "commands/tablespace.h"
+#include "commands/wait.h"
 #include "common/file_utils.h"
 #include "miscadmin.h"
 #include "pgstat.h"
@@ -1784,6 +1785,15 @@ PerformWalRecovery(void)
 break;
 			}
 
+			/*
+			* If we replayed an LSN that someone was waiting for,
+			* set latches in shared memory array to notify the waiter.
+			*/
+			if (XLogRecoveryCtl->lastReplayedEndRecPtr >= GetMinWait())
+			{
+ WaitSetLatch(XLogRecoveryCtl->lastReplayedEndRecPtr);
+			}
+
 			/* Else, try to fetch the next WAL record */
 			record = ReadRecord(xlogprefetcher, LOG, false, replayTLI);
 		} while (record != NULL);
diff --git a/src/backend/commands/Makefile b/src/backend/commands/Makefile
index 48f7348f91..d8f6965d8c 100644
--- a/src/backend/commands/Makefile
+++ b/src/backend/commands/Makefile
@@ -61,6 +61,7 @@ OBJS = \
 	vacuum.o \
 	vacuumparallel.o \
 	variable.o \
-	view.o
+	view.o \
+	wait.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/commands/meson.build b/src/backend/commands/meson.build
index 6dd00a4abd..3f06dc5341 100644
--- a/src/backend/commands/meson.build
+++ b/src/backend/commands/meson.build
@@ -50,4 +50,5 @@ backend_sources += files(
   'vacuumparallel.c',
   'variable.c',
   'view.c',
+  'wait.c',
 )
diff --git a/src/backend/commands/wait.c b/src/backend/commands/wait.c
new file mode 100644
index 00..880f141cf5
--- /dev/null
+++ b/src/backend/commands/wait.c
@@ -0,0 +1,338 @@
+/*-
+ *
+ * wait.c
+ *	  Implements WAIT FOR, which allows waiting for events such as
+ *	  time passing or LSN having been replayed on replica.
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 2020, Regents of PostgresPro
+ *
+ * IDENTIFICATION
+ *	  src/backend/commands/wait.c
+ *
+ *-
+ */
+
+#include 
+#include 
+#include "postgres.h"
+#include "pgstat.h"
+#include "fmgr.h"
+#include "access/transam.h"
+#include "access/xact.h"
+#include "access/xlog.h"
+#include "access/xlogdefs.h"
+#include "access/xlogrecovery.h"
+#include "catalog/pg_type.h"
+#include "commands/wait.h"
+#include "funcapi.h"
+#include "miscadmin.h"
+#include "storage/backendid.h"
+#include "storage/ipc.h"
+#include "storage/latch.h"
+#include "storage/pmsignal.h"
+#include "storage/proc.h"
+#include "storage/shmem.h"
+#include "storage/spin.h"
+#include "storage/sinvaladt.h"
+#include "utils/builtins.h"
+#include "utils/pg_lsn.h"
+#include "utils/timestamp.h"
+#include "executor/spi.h"
+#include "utils/fmgrprotos.h"
+
+/* Add to / delete from shared memory array */
+static void AddEvent(XLogRecPtr lsn_to_wait);
+static void DeleteEvent(void);
+
+/* Shared memory

Re: [HACKERS] make async slave to wait for lsn to be replayed

2024-01-11 Thread Kartyshov Ivan


Rebased and ready for review.
I left only versions (due to irreparable problems)

1) Classic (wait_classic_v4.patch)
https://www.postgresql.org/message-id/3cc883048264c2e9af022033925ff8db%40postgrespro.ru
==
advantages: multiple wait events, separate WAIT FOR statement
disadvantages: new words in grammar



WAIT FOR  [ANY | ALL] event [, ...]
BEGIN [ WORK | TRANSACTION ] [ transaction_mode [, ...] ]
 [ WAIT FOR [ANY | ALL] event [, ...]]
event:
LSN value
TIMEOUT number_of_milliseconds
timestamp



2) After style: Kyotaro and Freund (wait_after_within_v3.patch)
https://www.postgresql.org/message-id/d3ff2e363af60b345f82396992595a03%40postgrespro.ru
==
advantages: no new words in grammar
disadvantages: a little harder to understand



AFTER lsn_event [ WITHIN delay_milliseconds ] [, ...]
BEGIN [ WORK | TRANSACTION ] [ transaction_mode [, ...] ]
 [ AFTER lsn_event [ WITHIN delay_milliseconds ]]
START [ WORK | TRANSACTION ] [ transaction_mode [, ...] ]
 [ AFTER lsn_event [ WITHIN delay_milliseconds ]]


--
Ivan Kartyshov
Postgres Professional: www.postgrespro.comdiff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml
index fda4690eab..410018b79b 100644
--- a/doc/src/sgml/ref/allfiles.sgml
+++ b/doc/src/sgml/ref/allfiles.sgml
@@ -188,6 +188,7 @@ Complete list of usable sgml source files in this directory.
 
 
 
+
 
 
 
diff --git a/doc/src/sgml/ref/begin.sgml b/doc/src/sgml/ref/begin.sgml
index 016b021487..b3af16c09f 100644
--- a/doc/src/sgml/ref/begin.sgml
+++ b/doc/src/sgml/ref/begin.sgml
@@ -21,13 +21,21 @@ PostgreSQL documentation
 
  
 
-BEGIN [ WORK | TRANSACTION ] [ transaction_mode [, ...] ]
+BEGIN [ WORK | TRANSACTION ] [ transaction_mode [, ...] ] wait_for_event
 
 where transaction_mode is one of:
 
 ISOLATION LEVEL { SERIALIZABLE | REPEATABLE READ | READ COMMITTED | READ UNCOMMITTED }
 READ WRITE | READ ONLY
 [ NOT ] DEFERRABLE
+
+where wait_for_event is:
+WAIT FOR [ANY | ALL] event [, ...]
+
+and event is one of:
+LSN lsn_value
+TIMEOUT number_of_milliseconds
+timestamp
 
  
 
diff --git a/doc/src/sgml/ref/start_transaction.sgml b/doc/src/sgml/ref/start_transaction.sgml
index 74ccd7e345..1b54ed2084 100644
--- a/doc/src/sgml/ref/start_transaction.sgml
+++ b/doc/src/sgml/ref/start_transaction.sgml
@@ -21,13 +21,21 @@ PostgreSQL documentation
 
  
 
-START TRANSACTION [ transaction_mode [, ...] ]
+START TRANSACTION [ transaction_mode [, ...] ] wait_for_event
 
 where transaction_mode is one of:
 
 ISOLATION LEVEL { SERIALIZABLE | REPEATABLE READ | READ COMMITTED | READ UNCOMMITTED }
 READ WRITE | READ ONLY
 [ NOT ] DEFERRABLE
+
+where wait_for_event is:
+WAIT FOR [ANY | ALL] event [, ...]
+
+and event is one of:
+LSN lsn_value
+TIMEOUT number_of_milliseconds
+timestamp
 
  
 
diff --git a/doc/src/sgml/ref/wait.sgml b/doc/src/sgml/ref/wait.sgml
new file mode 100644
index 00..26cae3ad85
--- /dev/null
+++ b/doc/src/sgml/ref/wait.sgml
@@ -0,0 +1,146 @@
+
+
+
+ 
+  WAIT FOR
+ 
+
+ 
+  WAIT FOR
+  7
+  SQL - Language Statements
+ 
+
+ 
+  WAIT FOR
+  wait for the target LSN to be replayed or for specified time to pass
+ 
+
+ 
+
+WAIT FOR [ANY | ALL] event [, ...]
+
+where event is one of:
+LSN value
+TIMEOUT number_of_milliseconds
+timestamp
+
+WAIT FOR LSN 'lsn_number'
+WAIT FOR LSN 'lsn_number' TIMEOUT wait_timeout
+WAIT FOR LSN 'lsn_number', TIMESTAMP wait_time
+WAIT FOR TIMESTAMP wait_time
+WAIT FOR ALL LSN 'lsn_number' TIMEOUT wait_timeout, TIMESTAMP wait_time
+WAIT FOR ANY LSN 'lsn_number', TIMESTAMP wait_time
+
+ 
+
+ 
+  Description
+
+  
+   WAIT FOR provides a simple interprocess communication
+   mechanism to wait for timestamp, timeout or the target log sequence number
+   (LSN) on standby in PostgreSQL
+   databases with master-standby asynchronous replication. When run with the
+   LSN option, the WAIT FOR
+   command waits for the specified LSN to be replayed.
+   If no timestamp or timeout was specified, wait time is unlimited.
+   Waiting can be interrupted using Ctrl+C, or
+   by shutting down the postgres server.
+  
+
+ 
+
+ 
+  Parameters
+
+  
+   
+LSN
+
+ 
+  Specify the target log sequence number to wait for.
+ 
+
+   
+
+   
+TIMEOUT wait_timeout
+
+ 
+  Limit the time interval to wait for the LSN to be replayed.
+  The specified wait_timeout must be an integer
+  and is measured in milliseconds.
+ 
+
+   
+
+   
+UNTIL TIMESTAMP wait_time
+
+ 
+  Limit the time to wait for the LSN to be replayed.
+  The specified wait_time must be timestamp.
+ 
+
+   
+
+  
+ 
+
+ 
+  Examples
+
+  
+   Run WAIT FOR from psql,
+   limiting wait time to 1 milliseconds:
+
+
+WAIT FOR LSN '0/3F07A6B1' TIMEOUT 1;
+NOTICE:  LSN is not reached. Try to increase wait time.
+LSN reached
+-
+ f
+(1 row)
+
+  
+
+  
+   Wait until the specified LSN is replayed:
+
+WAIT FOR LSN

Re: [HACKERS] make async slave to wait for lsn to be replayed

2024-01-09 Thread vignesh C

On Fri, 8 Dec 2023 at 15:17, Kartyshov Ivan  wrote:
>
> Should rise disscusion on separate utility statement or find
> case where procedure version is failed.
>
> 1) Classic (wait_classic_v3.patch)
> https://www.postgresql.org/message-id/3cc883048264c2e9af022033925ff8db%40postgrespro.ru
> ==
> advantages: multiple wait events, separate WAIT FOR statement
> disadvantages: new words in grammar
>
>
>
> WAIT FOR  [ANY | ALL] event [, ...]
> BEGIN [ WORK | TRANSACTION ] [ transaction_mode [, ...] ]
>  [ WAIT FOR [ANY | ALL] event [, ...]]
> event:
> LSN value
> TIMEOUT number_of_milliseconds
> timestamp
>
>
>
> 2) After style: Kyotaro and Freund (wait_after_within_v2.patch)
> https://www.postgresql.org/message-id/d3ff2e363af60b345f82396992595a03%40postgrespro.ru
> ==
> advantages: no new words in grammar
> disadvantages: a little harder to understand
>
>
>
> AFTER lsn_event [ WITHIN delay_milliseconds ] [, ...]
> BEGIN [ WORK | TRANSACTION ] [ transaction_mode [, ...] ]
>  [ AFTER lsn_event [ WITHIN delay_milliseconds ]]
> START [ WORK | TRANSACTION ] [ transaction_mode [, ...] ]
>  [ AFTER lsn_event [ WITHIN delay_milliseconds ]]
>
>
>
> 3) Procedure style: Tom Lane and Kyotaro (wait_proc_v7.patch)
> https://www.postgresql.org/message-id/27171.1586439221%40sss.pgh.pa.us
> https://www.postgresql.org/message-id/20210121.173009.235021120161403875.horikyota.ntt%40gmail.com
> ==
> advantages: no new words in grammar,like it made in
> pg_last_wal_replay_lsn
> disadvantages: use snapshot xmin trick
> SELECT pg_waitlsn(‘LSN’, timeout);
> SELECT pg_waitlsn_infinite(‘LSN’);
> SELECT pg_waitlsn_no_wait(‘LSN’);

Few of the tests have aborted at [1] in CFBot with:
058`9c7ff550 7ff6`5bdff1f4
postgres!pg_atomic_compare_exchange_u64_impl(
struct pg_atomic_uint64 * ptr = 0x`0008,
unsigned int64 * expected = 0x0058`9c7ff5a0,
unsigned int64 newval = 0)+0x34
[c:\cirrus\src\include\port\atomics\generic-msvc.h @ 83]
0058`9c7ff580 7ff6`5bdff256 postgres!pg_atomic_read_u64_impl(
struct pg_atomic_uint64 * ptr = 0x`0008)+0x24
[c:\cirrus\src\include\port\atomics\generic.h @ 323]
0058`9c7ff5c0 7ff6`5bdfef67 postgres!pg_atomic_read_u64(
struct pg_atomic_uint64 * ptr = 0x`0008)+0x46
[c:\cirrus\src\include\port\atomics.h @ 430]
0058`9c7ff5f0 7ff6`5bc98fc3
postgres!GetMinWaitedLSN(void)+0x17
[c:\cirrus\src\backend\commands\wait.c @ 176]
0058`9c7ff620 7ff6`5bc82fb9
postgres!PerformWalRecovery(void)+0x4c3
[c:\cirrus\src\backend\access\transam\xlogrecovery.c @ 1788]
0058`9c7ff6e0 7ff6`5bffc651
postgres!StartupXLOG(void)+0x989
[c:\cirrus\src\backend\access\transam\xlog.c @ 5562]
0058`9c7ff870 7ff6`5bfed38b
postgres!StartupProcessMain(void)+0xd1
[c:\cirrus\src\backend\postmaster\startup.c @ 288]
0058`9c7ff8a0 7ff6`5bff49fd postgres!AuxiliaryProcessMain(
AuxProcType auxtype = StartupProcess (0n0))+0x1fb
[c:\cirrus\src\backend\postmaster\auxprocess.c @ 139]
0058`9c7ff8e0 7ff6`5beb7674 postgres!SubPostmasterMain(

More details are available at [2].

[1] - https://cirrus-ci.com/task/5618308515364864
[2] - 
https://api.cirrus-ci.com/v1/artifact/task/5618308515364864/crashlog/crashlog-postgres.exe_0008_2023-12-08_07-48-37-722.txt

Regards,
Vignesh

Re: [HACKERS] make async slave to wait for lsn to be replayed

2023-12-08 Thread Alexander Korotkov

On Fri, Dec 8, 2023 at 11:46 AM Kartyshov Ivan
 wrote:
>
> Should rise disscusion on separate utility statement or find
> case where procedure version is failed.
>
> 1) Classic (wait_classic_v3.patch)
> https://www.postgresql.org/message-id/3cc883048264c2e9af022033925ff8db%40postgrespro.ru
> ==
> advantages: multiple wait events, separate WAIT FOR statement
> disadvantages: new words in grammar
>
>
>
> WAIT FOR  [ANY | ALL] event [, ...]
> BEGIN [ WORK | TRANSACTION ] [ transaction_mode [, ...] ]
>  [ WAIT FOR [ANY | ALL] event [, ...]]
> event:
> LSN value
> TIMEOUT number_of_milliseconds
> timestamp

Nice, but as you stated requires new keywords.

> 2) After style: Kyotaro and Freund (wait_after_within_v2.patch)
> https://www.postgresql.org/message-id/d3ff2e363af60b345f82396992595a03%40postgrespro.ru
> ==
> advantages: no new words in grammar
> disadvantages: a little harder to understand
>
>
>
> AFTER lsn_event [ WITHIN delay_milliseconds ] [, ...]
> BEGIN [ WORK | TRANSACTION ] [ transaction_mode [, ...] ]
>  [ AFTER lsn_event [ WITHIN delay_milliseconds ]]
> START [ WORK | TRANSACTION ] [ transaction_mode [, ...] ]
>  [ AFTER lsn_event [ WITHIN delay_milliseconds ]]

+1 from me

> 3) Procedure style: Tom Lane and Kyotaro (wait_proc_v7.patch)
> https://www.postgresql.org/message-id/27171.1586439221%40sss.pgh.pa.us
> https://www.postgresql.org/message-id/20210121.173009.235021120161403875.horikyota.ntt%40gmail.com
> ==
> advantages: no new words in grammar,like it made in
> pg_last_wal_replay_lsn
> disadvantages: use snapshot xmin trick
> SELECT pg_waitlsn(‘LSN’, timeout);
> SELECT pg_waitlsn_infinite(‘LSN’);
> SELECT pg_waitlsn_no_wait(‘LSN’);

Nice, because simplicity.  But only safe if called within the simple
query containing nothing else.  Validating this from the function
kills the simplicity.

--
Regards,
Alexander Korotkov

Re: [HACKERS] make async slave to wait for lsn to be replayed

2023-12-08 Thread Alexander Korotkov

On Fri, Dec 8, 2023 at 11:20 AM Kartyshov Ivan
 wrote:
>
> On 2023-11-27 03:08, Alexander Korotkov wrote:
> > I've retried my case with v6 and it doesn't fail anymore.  But I
> > wonder how safe it is to reset xmin within the user-visible function?
> > We have no guarantee that the function is not called inside the
> > complex query.  Then how will the rest of the query work with xmin
> > reset?  Separate utility statement still looks like more safe option
> > for me.
>
> As you mentioned, we can`t guarantee that the function is not called
> inside the complex query, but we can return the xmin after waiting.

Returning xmin back isn't safe.  Especially after potentially long
waiting.  The snapshot could be no longer valid, because the
corresponding tuples could be VACUUM'ed.

--
Regards,
Alexander Korotkov

Re: [HACKERS] make async slave to wait for lsn to be replayed

2023-12-08 Thread Kartyshov Ivan


Should rise disscusion on separate utility statement or find
case where procedure version is failed.

1) Classic (wait_classic_v3.patch)
https://www.postgresql.org/message-id/3cc883048264c2e9af022033925ff8db%40postgrespro.ru
==
advantages: multiple wait events, separate WAIT FOR statement
disadvantages: new words in grammar



WAIT FOR  [ANY | ALL] event [, ...]
BEGIN [ WORK | TRANSACTION ] [ transaction_mode [, ...] ]
[ WAIT FOR [ANY | ALL] event [, ...]]
event:
LSN value
TIMEOUT number_of_milliseconds
timestamp



2) After style: Kyotaro and Freund (wait_after_within_v2.patch)
https://www.postgresql.org/message-id/d3ff2e363af60b345f82396992595a03%40postgrespro.ru
==
advantages: no new words in grammar
disadvantages: a little harder to understand



AFTER lsn_event [ WITHIN delay_milliseconds ] [, ...]
BEGIN [ WORK | TRANSACTION ] [ transaction_mode [, ...] ]
[ AFTER lsn_event [ WITHIN delay_milliseconds ]]
START [ WORK | TRANSACTION ] [ transaction_mode [, ...] ]
[ AFTER lsn_event [ WITHIN delay_milliseconds ]]



3) Procedure style: Tom Lane and Kyotaro (wait_proc_v7.patch)
https://www.postgresql.org/message-id/27171.1586439221%40sss.pgh.pa.us
https://www.postgresql.org/message-id/20210121.173009.235021120161403875.horikyota.ntt%40gmail.com
==
advantages: no new words in grammar,like it made in
pg_last_wal_replay_lsn
disadvantages: use snapshot xmin trick
SELECT pg_waitlsn(‘LSN’, timeout);
SELECT pg_waitlsn_infinite(‘LSN’);
SELECT pg_waitlsn_no_wait(‘LSN’);


Regards
--
Ivan Kartyshov
Postgres Professional: www.postgrespro.comdiff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml
index 54b5f22d6e..18695e013e 100644
--- a/doc/src/sgml/ref/allfiles.sgml
+++ b/doc/src/sgml/ref/allfiles.sgml
@@ -188,6 +188,7 @@ Complete list of usable sgml source files in this directory.
 
 
 
+
 
 
 
diff --git a/doc/src/sgml/ref/begin.sgml b/doc/src/sgml/ref/begin.sgml
index 016b021487..a2794763b1 100644
--- a/doc/src/sgml/ref/begin.sgml
+++ b/doc/src/sgml/ref/begin.sgml
@@ -21,13 +21,16 @@ PostgreSQL documentation
 
  
 
-BEGIN [ WORK | TRANSACTION ] [ transaction_mode [, ...] ]
+BEGIN [ WORK | TRANSACTION ] [ transaction_mode [, ...] ] wait_event
 
 where transaction_mode is one of:
 
 ISOLATION LEVEL { SERIALIZABLE | REPEATABLE READ | READ COMMITTED | READ UNCOMMITTED }
 READ WRITE | READ ONLY
 [ NOT ] DEFERRABLE
+
+where wait_event is:
+AFTER lsn_value [ WITHIN number_of_milliseconds ]
 
  
 
diff --git a/doc/src/sgml/ref/start_transaction.sgml b/doc/src/sgml/ref/start_transaction.sgml
index 74ccd7e345..46a3bcf1a8 100644
--- a/doc/src/sgml/ref/start_transaction.sgml
+++ b/doc/src/sgml/ref/start_transaction.sgml
@@ -21,13 +21,16 @@ PostgreSQL documentation
 
  
 
-START TRANSACTION [ transaction_mode [, ...] ]
+START TRANSACTION [ transaction_mode [, ...] ] wait_event
 
 where transaction_mode is one of:
 
 ISOLATION LEVEL { SERIALIZABLE | REPEATABLE READ | READ COMMITTED | READ UNCOMMITTED }
 READ WRITE | READ ONLY
 [ NOT ] DEFERRABLE
+
+where wait_event is:
+AFTER lsn_value [ WITHIN number_of_milliseconds ]
 
  
 
diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml
index e11b4b6130..a83ff4551e 100644
--- a/doc/src/sgml/reference.sgml
+++ b/doc/src/sgml/reference.sgml
@@ -216,6 +216,7 @@



+   
 
  
 
diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c
index c6156a..08399452ce 100644
--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -43,6 +43,7 @@
 #include "backup/basebackup.h"
 #include "catalog/pg_control.h"
 #include "commands/tablespace.h"
+#include "commands/wait.h"
 #include "common/file_utils.h"
 #include "miscadmin.h"
 #include "pgstat.h"
@@ -1780,6 +1781,15 @@ PerformWalRecovery(void)
 break;
 			}
 
+			/*
+			* If we replayed an LSN that someone was waiting for,
+			* set latches in shared memory array to notify the waiter.
+			*/
+			if (XLogRecoveryCtl->lastReplayedEndRecPtr >= GetMinWait())
+			{
+ WaitSetLatch(XLogRecoveryCtl->lastReplayedEndRecPtr);
+			}
+
 			/* Else, try to fetch the next WAL record */
 			record = ReadRecord(xlogprefetcher, LOG, false, replayTLI);
 		} while (record != NULL);
diff --git a/src/backend/commands/Makefile b/src/backend/commands/Makefile
index 48f7348f91..d8f6965d8c 100644
--- a/src/backend/commands/Makefile
+++ b/src/backend/commands/Makefile
@@ -61,6 +61,7 @@ OBJS = \
 	vacuum.o \
 	vacuumparallel.o \
 	variable.o \
-	view.o
+	view.o \
+	wait.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/commands/meson.build b/src/backend/commands/meson.build
index 42cced9ebe..ec6ab7722a 100644
--- a/src/backend/commands/meson.build
+++ b/src/backend/commands/meson.build
@@ -50,4 +50,5 @@ backend_sources += files(
   'vacuumparallel.c',
   'variable.c',
   'view.c',
+  'wait.c',
 )
diff --git a/src/backend/commands/wait.c

Re: [HACKERS] make async slave to wait for lsn to be replayed

2023-12-08 Thread Kartyshov Ivan


On 2023-11-27 03:08, Alexander Korotkov wrote:

I've retried my case with v6 and it doesn't fail anymore.  But I
wonder how safe it is to reset xmin within the user-visible function?
We have no guarantee that the function is not called inside the
complex query.  Then how will the rest of the query work with xmin
reset?  Separate utility statement still looks like more safe option
for me.


As you mentioned, we can`t guarantee that the function is not called
inside the complex query, but we can return the xmin after waiting.
But you are right and separate utility statement still looks more safe.
So I want to bring up the discussion on separate utility statement 
again.


--
Ivan Kartyshov
Postgres Professional: www.postgrespro.com

Re: [HACKERS] make async slave to wait for lsn to be replayed

2023-11-26 Thread Alexander Korotkov

On Mon, Nov 20, 2023 at 1:10 PM Картышов Иван
 wrote:
> Alexander, thank you for your review and pointing this issues. According to
> them I made some fixes and rebase all patch.
>
> But I can`t repeat your ERROR. Not with hot_standby_feedback = on nor
> hot_standby_feedback = off.
>
> master: create table test as (select i from generate_series(1,1) i);
> slave conn1: select pg_wal_replay_pause();
> master: delete from test;
> master: vacuum test;
> master: select pg_current_wal_lsn();
> slave conn2: select pg_wait_lsn('the value from previous query'::pg_lsn, 0);
> slave conn1: select pg_wal_replay_resume();
> slave conn2: ERROR: canceling statement due to conflict with recovery
> DETAIL: User query might have needed to see row versions that must be removed.
>
> Also I use little hack to work out of snapshot similar to SnapshotResetXmin.
>
> Patch rebased and ready for review.

I've retried my case with v6 and it doesn't fail anymore.  But I
wonder how safe it is to reset xmin within the user-visible function?
We have no guarantee that the function is not called inside the
complex query.  Then how will the rest of the query work with xmin
reset?  Separate utility statement still looks like more safe option
for me.

--
Regards,
Alexander Korotkov

Re: [HACKERS] make async slave to wait for lsn to be replayed

2023-11-26 Thread Alexander Korotkov

On Thu, Nov 23, 2023 at 5:52 AM Bowen Shi  wrote:
> I used the latest code and found some conflicts while applying. Which PG 
> version did you rebase?

I've successfully applied the patch on bc3c8db8ae.  But I've used
"patch -p1 < wait_proc_v6.patch", git am doesn't work.

--
Regards,
Alexander Korotkov

Re: [HACKERS] make async slave to wait for lsn to be replayed

2023-11-22 Thread Bowen Shi

Hi,

I used the latest code and found some conflicts while applying. Which PG
version did you rebase?

Regards
Bowen Shi

Re: [HACKERS] make async slave to wait for lsn to be replayed

2023-11-20 Thread Картышов Иван


Alexander, thank you for your review and pointing this issues. According to
them I made some fixes and rebase all patch.

But I can`t repeat your ERROR. Not with hot_standby_feedback = on nor 
hot_standby_feedback = off.master: create table test as (select i from 
generate_series(1,1) i);
slave conn1: select pg_wal_replay_pause();
master: delete from test;
master: vacuum test;
master: select pg_current_wal_lsn();
slave conn2: select pg_wait_lsn('the value from previous query'::pg_lsn, 0);
slave conn1: select pg_wal_replay_resume();
slave conn2: ERROR: canceling statement due to conflict with recovery
DETAIL: User query might have needed to see row versions that must be 
removed.Also I use little hack to work out of snapshot similar to 
SnapshotResetXmin.

Patch rebased and ready for review.


 
diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c
index c6156a..edbbe208cb 100644
--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -43,6 +43,7 @@
 #include "backup/basebackup.h"
 #include "catalog/pg_control.h"
 #include "commands/tablespace.h"
+#include "commands/wait.h"
 #include "common/file_utils.h"
 #include "miscadmin.h"
 #include "pgstat.h"
@@ -1780,6 +1781,15 @@ PerformWalRecovery(void)
 break;
 			}
 
+			/*
+			 * If we replayed an LSN that someone was waiting for,
+			 * set latches in shared memory array to notify the waiter.
+			 */
+			if (XLogRecoveryCtl->lastReplayedEndRecPtr >= GetMinWaitedLSN())
+			{
+WaitSetLatch(XLogRecoveryCtl->lastReplayedEndRecPtr);
+			}
+
 			/* Else, try to fetch the next WAL record */
 			record = ReadRecord(xlogprefetcher, LOG, false, replayTLI);
 		} while (record != NULL);
diff --git a/src/backend/commands/Makefile b/src/backend/commands/Makefile
index 48f7348f91..d8f6965d8c 100644
--- a/src/backend/commands/Makefile
+++ b/src/backend/commands/Makefile
@@ -61,6 +61,7 @@ OBJS = \
 	vacuum.o \
 	vacuumparallel.o \
 	variable.o \
-	view.o
+	view.o \
+	wait.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/commands/meson.build b/src/backend/commands/meson.build
index 42cced9ebe..ec6ab7722a 100644
--- a/src/backend/commands/meson.build
+++ b/src/backend/commands/meson.build
@@ -50,4 +50,5 @@ backend_sources += files(
   'vacuumparallel.c',
   'variable.c',
   'view.c',
+  'wait.c',
 )
diff --git a/src/backend/commands/wait.c b/src/backend/commands/wait.c
new file mode 100644
index 00..7db208c726
--- /dev/null
+++ b/src/backend/commands/wait.c
@@ -0,0 +1,284 @@
+/*-
+ *
+ * wait.c
+ *	  Implements wait lsn, which allows waiting for events such as
+ *	  LSN having been replayed on replica.
+ *
+ * Portions Copyright (c) 1996-2023, PostgreSQL Global Development Group
+ * Portions Copyright (c) 2023, Regents of PostgresPro
+ *
+ * IDENTIFICATION
+ *	  src/backend/commands/wait.c
+ *
+ *-
+ */
+#include "postgres.h"
+
+#include 
+
+#include "access/xact.h"
+#include "access/xlog.h"
+#include "access/xlogrecovery.h"
+#include "access/xlogdefs.h"
+#include "commands/wait.h"
+#include "funcapi.h"
+#include "miscadmin.h"
+#include "pgstat.h"
+#include "storage/backendid.h"
+#include "storage/pmsignal.h"
+#include "storage/proc.h"
+#include "storage/shmem.h"
+#include "storage/sinvaladt.h"
+#include "storage/spin.h"
+#include "utils/builtins.h"
+#include "utils/pg_lsn.h"
+#include "utils/timestamp.h"
+
+/* Add to shared memory array */
+static void AddWaitedLSN(XLogRecPtr lsn_to_wait);
+
+/* Shared memory structure */
+typedef struct
+{
+	int			backend_maxid;
+	pg_atomic_uint64 min_lsn; /* XLogRecPtr of minimal waited for LSN */
+	slock_t		mutex;
+	/* LSNs that different backends are waiting */
+	XLogRecPtr	lsn[FLEXIBLE_ARRAY_MEMBER];
+} WaitState;
+
+static WaitState *state;
+
+/*
+ * Add the wait event of the current backend to shared memory array
+ */
+static void
+AddWaitedLSN(XLogRecPtr lsn_to_wait)
+{
+	SpinLockAcquire(>mutex);
+	if (state->backend_maxid < MyBackendId)
+		state->backend_maxid = MyBackendId;
+
+	state->lsn[MyBackendId] = lsn_to_wait;
+
+	if (lsn_to_wait < state->min_lsn.value)
+		state->min_lsn.value = lsn_to_wait;
+	SpinLockRelease(>mutex);
+}
+
+/*
+ * Delete wait event of the current backend from the shared memory array.
+ */
+void
+DeleteWaitedLSN(void)
+{
+	int			i;
+	XLogRecPtr	lsn_to_delete;
+
+	SpinLockAcquire(>mutex);
+
+	lsn_to_delete = state->lsn[MyBackendId];
+	state->lsn[MyBackendId] = InvalidXLogRecPtr;
+
+	/* If we are deleting the minimal LSN, then choose the next min_lsn */
+	if (lsn_to_delete != InvalidXLogRecPtr &&
+		lsn_to_delete == state->min_lsn.value)
+	{
+		state->min_lsn.value = PG_UINT64_MAX;
+		for (i = 2; i <= state->backend_maxid; i++)
+			if (state->lsn[i] != InvalidXLogRecPtr &&
+state->lsn[i] < state->min_lsn.value)
+

Re: [HACKERS] make async slave to wait for lsn to be replayed

2023-10-14 Thread Alexander Korotkov

Hi!

On Wed, Oct 4, 2023 at 1:22 PM Alexander Korotkov  wrote:
> I see you're concentrating on the procedural version of this feature.  But 
> when you're calling a procedure within a normal SQL statement, the executor 
> gets a snapshot and holds it until the procedure finishes. In the case the 
> WAL record conflicts with this snapshot, the query will be canceled.  
> Alternatively, when hot_standby_feedback = on, the query and WAL replayer 
> will be in a deadlock (WAL replayer will wait for the query to finish, and 
> the query will wait for WAL replayed).  Do you see this issue?  Or do you 
> think I'm missing something?

I'm sorry, I actually meant hot_standby_feedback = off
(hot_standby_feedback = on actually avoids query conflicts).  I
managed to reproduce this problem.

master: create table test as (select i from generate_series(1,1) i);
slave conn1: select pg_wal_replay_pause();
master: delete from test;
master: vacuum test;
master: select pg_current_wal_lsn();
slave conn2: select pg_wait_lsn('the value from previous query'::pg_lsn, 0);
slave conn1: select pg_wal_replay_resume();
slave conn2: ERROR:  canceling statement due to conflict with recovery
DETAIL:  User query might have needed to see row versions that must be removed.

Needless to say, this is very undesirable behavior.  This happens
because pg_wait_lsn() has to run within a snapshot as any other
function.  This is why I think this functionality should be
implemented as a separate statement.

Another issue I found is that pg_wait_lsn() hangs on the primary.   I
think an error should be reported instead.

--
Regards,
Alexander Korotkov

Re: [HACKERS] make async slave to wait for lsn to be replayed

2023-10-04 Thread Alexander Korotkov

Hi, Ivan!

On Fri, Jun 30, 2023 at 11:32 AM Картышов Иван 
wrote:

> All rebased and tested
>

Thank you for continuing to work on this patch.

I see you're concentrating on the procedural version of this feature.  But
when you're calling a procedure within a normal SQL statement, the executor
gets a snapshot and holds it until the procedure finishes. In the case the
WAL record conflicts with this snapshot, the query will be canceled.
Alternatively, when hot_standby_feedback = on, the query and WAL replayer
will be in a deadlock (WAL replayer will wait for the query to finish, and
the query will wait for WAL replayed).  Do you see this issue?  Or do you
think I'm missing something?

XLogRecPtr
GetMinWaitedLSN(void)
{
return state->min_lsn.value;
}

You definitely shouldn't access directly the fields
inside pg_atomic_uint64.  In this particular case, you should
use pg_atomic_read_u64().

Also, I think there is a race condition.

/* Check if we already reached the needed LSN */
if (cur_lsn >= target_lsn)
return true;

AddWaitedLSN(target_lsn);

Imagine, PerformWalRecovery() will replay a record after the check, but
before AddWaitedLSN().  This code will start the waiting cycle even if the
LSN is already achieved.  Surely this cycle will end soon because it
rechecks LSN value each 100 ms.  But anyway, I think there should be
another check after AddWaitedLSN() for the sake of consistency.

--
Regards,
Alexander Korotkov

Re: [HACKERS] make async slave to wait for lsn to be replayed

2023-06-30 Thread Картышов Иван

All rebased and tested

--
Ivan Kartyshov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres comp...@postgrespro.ru>
 
diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c
index becc2bda62..c7460bd9b8 100644
--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -43,6 +43,7 @@
 #include "backup/basebackup.h"
 #include "catalog/pg_control.h"
 #include "commands/tablespace.h"
+#include "commands/wait.h"
 #include "common/file_utils.h"
 #include "miscadmin.h"
 #include "pgstat.h"
@@ -1752,6 +1753,15 @@ PerformWalRecovery(void)
 break;
 			}
 
+			/*
+			 * If we replayed an LSN that someone was waiting for,
+			 * set latches in shared memory array to notify the waiter.
+			 */
+			if (XLogRecoveryCtl->lastReplayedEndRecPtr >= GetMinWaitedLSN())
+			{
+WaitSetLatch(XLogRecoveryCtl->lastReplayedEndRecPtr);
+			}
+
 			/* Else, try to fetch the next WAL record */
 			record = ReadRecord(xlogprefetcher, LOG, false, replayTLI);
 		} while (record != NULL);
diff --git a/src/backend/commands/Makefile b/src/backend/commands/Makefile
index 48f7348f91..d8f6965d8c 100644
--- a/src/backend/commands/Makefile
+++ b/src/backend/commands/Makefile
@@ -61,6 +61,7 @@ OBJS = \
 	vacuum.o \
 	vacuumparallel.o \
 	variable.o \
-	view.o
+	view.o \
+	wait.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/commands/meson.build b/src/backend/commands/meson.build
index 42cced9ebe..ec6ab7722a 100644
--- a/src/backend/commands/meson.build
+++ b/src/backend/commands/meson.build
@@ -50,4 +50,5 @@ backend_sources += files(
   'vacuumparallel.c',
   'variable.c',
   'view.c',
+  'wait.c',
 )
diff --git a/src/backend/commands/wait.c b/src/backend/commands/wait.c
new file mode 100644
index 00..2f9be4f9ad
--- /dev/null
+++ b/src/backend/commands/wait.c
@@ -0,0 +1,275 @@
+/*-
+ *
+ * wait.c
+ *	  Implements wait lsn, which allows waiting for events such as
+ *	  LSN having been replayed on replica.
+ *
+ * Portions Copyright (c) 1996-2023, PostgreSQL Global Development Group
+ * Portions Copyright (c) 2023, Regents of PostgresPro
+ *
+ * IDENTIFICATION
+ *	  src/backend/commands/wait.c
+ *
+ *-
+ */
+#include "postgres.h"
+
+#include 
+
+#include "access/xact.h"
+#include "access/xlogrecovery.h"
+#include "access/xlogdefs.h"
+#include "commands/wait.h"
+#include "funcapi.h"
+#include "miscadmin.h"
+#include "pgstat.h"
+#include "storage/backendid.h"
+#include "storage/pmsignal.h"
+#include "storage/proc.h"
+#include "storage/shmem.h"
+#include "storage/sinvaladt.h"
+#include "storage/spin.h"
+#include "utils/builtins.h"
+#include "utils/pg_lsn.h"
+#include "utils/timestamp.h"
+
+/* Add to shared memory array */
+static void AddWaitedLSN(XLogRecPtr lsn_to_wait);
+
+/* Shared memory structure */
+typedef struct
+{
+	int			backend_maxid;
+	pg_atomic_uint64 min_lsn; /* XLogRecPtr of minimal waited for LSN */
+	slock_t		mutex;
+	/* LSNs that different backends are waiting */
+	XLogRecPtr	lsn[FLEXIBLE_ARRAY_MEMBER];
+} WaitState;
+
+static WaitState *state;
+
+/*
+ * Add the wait event of the current backend to shared memory array
+ */
+static void
+AddWaitedLSN(XLogRecPtr lsn_to_wait)
+{
+	SpinLockAcquire(>mutex);
+	if (state->backend_maxid < MyBackendId)
+		state->backend_maxid = MyBackendId;
+
+	state->lsn[MyBackendId] = lsn_to_wait;
+
+	if (lsn_to_wait < state->min_lsn.value)
+		state->min_lsn.value = lsn_to_wait;
+	SpinLockRelease(>mutex);
+}
+
+/*
+ * Delete wait event of the current backend from the shared memory array.
+ */
+void
+DeleteWaitedLSN(void)
+{
+	int			i;
+	XLogRecPtr	lsn_to_delete;
+
+	SpinLockAcquire(>mutex);
+
+	lsn_to_delete = state->lsn[MyBackendId];
+	state->lsn[MyBackendId] = InvalidXLogRecPtr;
+
+	/* If we are deleting the minimal LSN, then choose the next min_lsn */
+	if (lsn_to_delete != InvalidXLogRecPtr &&
+		lsn_to_delete == state->min_lsn.value)
+	{
+		state->min_lsn.value = PG_UINT64_MAX;
+		for (i = 2; i <= state->backend_maxid; i++)
+			if (state->lsn[i] != InvalidXLogRecPtr &&
+state->lsn[i] < state->min_lsn.value)
+state->min_lsn.value = state->lsn[i];
+	}
+
+	/* If deleting from the end of the array, shorten the array's used part */
+	if (state->backend_maxid == MyBackendId)
+		for (i = (MyBackendId); i >= 2; i--)
+			if (state->lsn[i] != InvalidXLogRecPtr)
+			{
+state->backend_maxid = i;
+break;
+			}
+
+	SpinLockRelease(>mutex);
+}
+
+/*
+ * Report amount of shared memory space needed for WaitState
+ */
+Size
+WaitShmemSize(void)
+{
+	Size		size;
+
+	size = offsetof(WaitState, lsn);
+	size = add_size(size, mul_size(MaxBackends + 1, sizeof(XLogRecPtr)));
+	return size;
+}
+
+/*
+ * Initialize an array of events to wait for in shared memory
+ */
+void
+WaitShmemInit(void)
+{
+	bool		found;
+	uint32		i;
+
+

Re: [HACKERS] make async slave to wait for lsn to be replayed

2023-03-06 Thread Kartyshov Ivan


Fix build.meson troubles

--
Ivan Kartyshov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Companydiff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c
index dbe9394762..422bb1ed82 100644
--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -43,6 +43,7 @@
 #include "backup/basebackup.h"
 #include "catalog/pg_control.h"
 #include "commands/tablespace.h"
+#include "commands/wait.h"
 #include "common/file_utils.h"
 #include "miscadmin.h"
 #include "pgstat.h"
@@ -1752,6 +1753,15 @@ PerformWalRecovery(void)
 break;
 			}
 
+			/*
+			 * If we replayed an LSN that someone was waiting for,
+			 * set latches in shared memory array to notify the waiter.
+			 */
+			if (XLogRecoveryCtl->lastReplayedEndRecPtr >= GetMinWaitedLSN())
+			{
+WaitSetLatch(XLogRecoveryCtl->lastReplayedEndRecPtr);
+			}
+
 			/* Else, try to fetch the next WAL record */
 			record = ReadRecord(xlogprefetcher, LOG, false, replayTLI);
 		} while (record != NULL);
diff --git a/src/backend/commands/Makefile b/src/backend/commands/Makefile
index 48f7348f91..d8f6965d8c 100644
--- a/src/backend/commands/Makefile
+++ b/src/backend/commands/Makefile
@@ -61,6 +61,7 @@ OBJS = \
 	vacuum.o \
 	vacuumparallel.o \
 	variable.o \
-	view.o
+	view.o \
+	wait.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/commands/meson.build b/src/backend/commands/meson.build
index 42cced9ebe..ec6ab7722a 100644
--- a/src/backend/commands/meson.build
+++ b/src/backend/commands/meson.build
@@ -50,4 +50,5 @@ backend_sources += files(
   'vacuumparallel.c',
   'variable.c',
   'view.c',
+  'wait.c',
 )
diff --git a/src/backend/commands/wait.c b/src/backend/commands/wait.c
new file mode 100644
index 00..3465673f0a
--- /dev/null
+++ b/src/backend/commands/wait.c
@@ -0,0 +1,275 @@
+/*-
+ *
+ * wait.c
+ *	  Implements wait lsn, which allows waiting for events such as
+ *	  LSN having been replayed on replica.
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 2020, Regents of PostgresPro
+ *
+ * IDENTIFICATION
+ *	  src/backend/commands/wait.c
+ *
+ *-
+ */
+#include "postgres.h"
+
+#include 
+
+#include "access/xact.h"
+#include "access/xlogrecovery.h"
+#include "access/xlogdefs.h"
+#include "commands/wait.h"
+#include "funcapi.h"
+#include "miscadmin.h"
+#include "pgstat.h"
+#include "storage/backendid.h"
+#include "storage/pmsignal.h"
+#include "storage/proc.h"
+#include "storage/shmem.h"
+#include "storage/sinvaladt.h"
+#include "storage/spin.h"
+#include "utils/builtins.h"
+#include "utils/pg_lsn.h"
+#include "utils/timestamp.h"
+
+/* Add to shared memory array */
+static void AddWaitedLSN(XLogRecPtr lsn_to_wait);
+
+/* Shared memory structure */
+typedef struct
+{
+	int			backend_maxid;
+	pg_atomic_uint64 min_lsn; /* XLogRecPtr of minimal waited for LSN */
+	slock_t		mutex;
+	/* LSNs that different backends are waiting */
+	XLogRecPtr	lsn[FLEXIBLE_ARRAY_MEMBER];
+} WaitState;
+
+static WaitState *state;
+
+/*
+ * Add the wait event of the current backend to shared memory array
+ */
+static void
+AddWaitedLSN(XLogRecPtr lsn_to_wait)
+{
+	SpinLockAcquire(>mutex);
+	if (state->backend_maxid < MyBackendId)
+		state->backend_maxid = MyBackendId;
+
+	state->lsn[MyBackendId] = lsn_to_wait;
+
+	if (lsn_to_wait < state->min_lsn.value)
+		state->min_lsn.value = lsn_to_wait;
+	SpinLockRelease(>mutex);
+}
+
+/*
+ * Delete wait event of the current backend from the shared memory array.
+ */
+void
+DeleteWaitedLSN(void)
+{
+	int			i;
+	XLogRecPtr	lsn_to_delete;
+
+	SpinLockAcquire(>mutex);
+
+	lsn_to_delete = state->lsn[MyBackendId];
+	state->lsn[MyBackendId] = InvalidXLogRecPtr;
+
+	/* If we are deleting the minimal LSN, then choose the next min_lsn */
+	if (lsn_to_delete != InvalidXLogRecPtr &&
+		lsn_to_delete == state->min_lsn.value)
+	{
+		state->min_lsn.value = PG_UINT64_MAX;
+		for (i = 2; i <= state->backend_maxid; i++)
+			if (state->lsn[i] != InvalidXLogRecPtr &&
+state->lsn[i] < state->min_lsn.value)
+state->min_lsn.value = state->lsn[i];
+	}
+
+	/* If deleting from the end of the array, shorten the array's used part */
+	if (state->backend_maxid == MyBackendId)
+		for (i = (MyBackendId); i >= 2; i--)
+			if (state->lsn[i] != InvalidXLogRecPtr)
+			{
+state->backend_maxid = i;
+break;
+			}
+
+	SpinLockRelease(>mutex);
+}
+
+/*
+ * Report amount of shared memory space needed for WaitState
+ */
+Size
+WaitShmemSize(void)
+{
+	Size		size;
+
+	size = offsetof(WaitState, lsn);
+	size = add_size(size, mul_size(MaxBackends + 1, sizeof(XLogRecPtr)));
+	return size;
+}
+
+/*
+ * Initialize an array of events to wait for in shared memory
+ */
+void
+WaitShmemInit(void)
+{
+	bool		found;
+	uint32		i;
+
+	state =

Re: [HACKERS] make async slave to wait for lsn to be replayed

2023-03-06 Thread Kartyshov Ivan


Update patch to fix conflict with master
--
Ivan Kartyshov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Companydiff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c
index dbe9394762..422bb1ed82 100644
--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -43,6 +43,7 @@
 #include "backup/basebackup.h"
 #include "catalog/pg_control.h"
 #include "commands/tablespace.h"
+#include "commands/wait.h"
 #include "common/file_utils.h"
 #include "miscadmin.h"
 #include "pgstat.h"
@@ -1752,6 +1753,15 @@ PerformWalRecovery(void)
 break;
 			}
 
+			/*
+			 * If we replayed an LSN that someone was waiting for,
+			 * set latches in shared memory array to notify the waiter.
+			 */
+			if (XLogRecoveryCtl->lastReplayedEndRecPtr >= GetMinWaitedLSN())
+			{
+WaitSetLatch(XLogRecoveryCtl->lastReplayedEndRecPtr);
+			}
+
 			/* Else, try to fetch the next WAL record */
 			record = ReadRecord(xlogprefetcher, LOG, false, replayTLI);
 		} while (record != NULL);
diff --git a/src/backend/commands/Makefile b/src/backend/commands/Makefile
index 48f7348f91..d8f6965d8c 100644
--- a/src/backend/commands/Makefile
+++ b/src/backend/commands/Makefile
@@ -61,6 +61,7 @@ OBJS = \
 	vacuum.o \
 	vacuumparallel.o \
 	variable.o \
-	view.o
+	view.o \
+	wait.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/commands/wait.c b/src/backend/commands/wait.c
new file mode 100644
index 00..3465673f0a
--- /dev/null
+++ b/src/backend/commands/wait.c
@@ -0,0 +1,275 @@
+/*-
+ *
+ * wait.c
+ *	  Implements wait lsn, which allows waiting for events such as
+ *	  LSN having been replayed on replica.
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 2020, Regents of PostgresPro
+ *
+ * IDENTIFICATION
+ *	  src/backend/commands/wait.c
+ *
+ *-
+ */
+#include "postgres.h"
+
+#include 
+
+#include "access/xact.h"
+#include "access/xlogrecovery.h"
+#include "access/xlogdefs.h"
+#include "commands/wait.h"
+#include "funcapi.h"
+#include "miscadmin.h"
+#include "pgstat.h"
+#include "storage/backendid.h"
+#include "storage/pmsignal.h"
+#include "storage/proc.h"
+#include "storage/shmem.h"
+#include "storage/sinvaladt.h"
+#include "storage/spin.h"
+#include "utils/builtins.h"
+#include "utils/pg_lsn.h"
+#include "utils/timestamp.h"
+
+/* Add to shared memory array */
+static void AddWaitedLSN(XLogRecPtr lsn_to_wait);
+
+/* Shared memory structure */
+typedef struct
+{
+	int			backend_maxid;
+	pg_atomic_uint64 min_lsn; /* XLogRecPtr of minimal waited for LSN */
+	slock_t		mutex;
+	/* LSNs that different backends are waiting */
+	XLogRecPtr	lsn[FLEXIBLE_ARRAY_MEMBER];
+} WaitState;
+
+static WaitState *state;
+
+/*
+ * Add the wait event of the current backend to shared memory array
+ */
+static void
+AddWaitedLSN(XLogRecPtr lsn_to_wait)
+{
+	SpinLockAcquire(>mutex);
+	if (state->backend_maxid < MyBackendId)
+		state->backend_maxid = MyBackendId;
+
+	state->lsn[MyBackendId] = lsn_to_wait;
+
+	if (lsn_to_wait < state->min_lsn.value)
+		state->min_lsn.value = lsn_to_wait;
+	SpinLockRelease(>mutex);
+}
+
+/*
+ * Delete wait event of the current backend from the shared memory array.
+ */
+void
+DeleteWaitedLSN(void)
+{
+	int			i;
+	XLogRecPtr	lsn_to_delete;
+
+	SpinLockAcquire(>mutex);
+
+	lsn_to_delete = state->lsn[MyBackendId];
+	state->lsn[MyBackendId] = InvalidXLogRecPtr;
+
+	/* If we are deleting the minimal LSN, then choose the next min_lsn */
+	if (lsn_to_delete != InvalidXLogRecPtr &&
+		lsn_to_delete == state->min_lsn.value)
+	{
+		state->min_lsn.value = PG_UINT64_MAX;
+		for (i = 2; i <= state->backend_maxid; i++)
+			if (state->lsn[i] != InvalidXLogRecPtr &&
+state->lsn[i] < state->min_lsn.value)
+state->min_lsn.value = state->lsn[i];
+	}
+
+	/* If deleting from the end of the array, shorten the array's used part */
+	if (state->backend_maxid == MyBackendId)
+		for (i = (MyBackendId); i >= 2; i--)
+			if (state->lsn[i] != InvalidXLogRecPtr)
+			{
+state->backend_maxid = i;
+break;
+			}
+
+	SpinLockRelease(>mutex);
+}
+
+/*
+ * Report amount of shared memory space needed for WaitState
+ */
+Size
+WaitShmemSize(void)
+{
+	Size		size;
+
+	size = offsetof(WaitState, lsn);
+	size = add_size(size, mul_size(MaxBackends + 1, sizeof(XLogRecPtr)));
+	return size;
+}
+
+/*
+ * Initialize an array of events to wait for in shared memory
+ */
+void
+WaitShmemInit(void)
+{
+	bool		found;
+	uint32		i;
+
+	state = (WaitState *) ShmemInitStruct("pg_wait_lsn",
+		  WaitShmemSize(),
+		  );
+	if (!found)
+	{
+		SpinLockInit(>mutex);
+
+		for (i = 0; i < (MaxBackends + 1); i++)
+			state->lsn[i] = InvalidXLogRecPtr;
+
+		state->backend_maxid = 0;
+		state->min_lsn.value = PG_UINT64_MAX;
+	}
+}
+

Re: [HACKERS] make async slave to wait for lsn to be replayed

2023-03-04 Thread Kartyshov Ivan


Here I made new patch of feature, discussed above.

WAIT FOR procedure - waits for certain lsn on pause
==
Synopsis
==
SELECT pg_wait_lsn(‘LSN’, timeout) returns boolean

Where timeout = 0, will wait infinite without timeout
And if timeout = 1, then just check if lsn was replayed

How to use it
==

Greg Stark  wrote:

That said, I'm not a fan of the specific function names. Remember that
we have polymorphic functions so you could probably just have an
option argument:


If you have any example, I will be glade to see them. Ьy searches have
not been fruitful.

Michael Paquier wrote:

While looking at all the patches proposed, I have noticed that all the
approaches proposed force a wakeup of the waiters in the redo loop of
the startup process for each record, before reading the next record.
It strikes me that there is some interaction with custom resource
managers here, where it is possible to poke at the waiters not for
each record, but after reading some specific records.  Something
out-of-core would not be as responsive as the per-record approach,
still responsive enough that the waiters wait on input for an
acceptable amount of time, depending on the frequency of the records
generated by a primary to wake them up.  Just something that popped
into my mind while looking a bit at the threads.


I`ll work on this idea to have less impact on the redo system.

On 2023-03-02 13:33, Peter Eisentraut wrote:

But I wonder how a client is going to get the LSN.  How would all of
this be used by a client?

As I wrote earlier main purpose of the feature is to achieve
read-your-writes-consistency, while using async replica for reads and
primary for writes. In that case lsn of last modification is stored
inside application.


I'm tempted to think this could be a protocol-layer facility.  Every
query automatically returns the current LSN, and every query can also
send along an LSN to wait for, and the client library would just keep
track of the LSN for (what it thinks of as) the connection.  So you
get some automatic serialization without having to modify your client
code.

Yes it sounds very tempted. But I think community will be against it.

--
Ivan Kartyshov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Companydiff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c
index dbe9394762..422bb1ed82 100644
--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -43,6 +43,7 @@
 #include "backup/basebackup.h"
 #include "catalog/pg_control.h"
 #include "commands/tablespace.h"
+#include "commands/wait.h"
 #include "common/file_utils.h"
 #include "miscadmin.h"
 #include "pgstat.h"
@@ -1752,6 +1753,15 @@ PerformWalRecovery(void)
 break;
 			}
 
+			/*
+			 * If we replayed an LSN that someone was waiting for,
+			 * set latches in shared memory array to notify the waiter.
+			 */
+			if (XLogRecoveryCtl->lastReplayedEndRecPtr >= GetMinWaitedLSN())
+			{
+WaitSetLatch(XLogRecoveryCtl->lastReplayedEndRecPtr);
+			}
+
 			/* Else, try to fetch the next WAL record */
 			record = ReadRecord(xlogprefetcher, LOG, false, replayTLI);
 		} while (record != NULL);
diff --git a/src/backend/commands/Makefile b/src/backend/commands/Makefile
index 48f7348f91..d8f6965d8c 100644
--- a/src/backend/commands/Makefile
+++ b/src/backend/commands/Makefile
@@ -61,6 +61,7 @@ OBJS = \
 	vacuum.o \
 	vacuumparallel.o \
 	variable.o \
-	view.o
+	view.o \
+	wait.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/commands/wait.c b/src/backend/commands/wait.c
new file mode 100644
index 00..3465673f0a
--- /dev/null
+++ b/src/backend/commands/wait.c
@@ -0,0 +1,275 @@
+/*-
+ *
+ * wait.c
+ *	  Implements wait lsn, which allows waiting for events such as
+ *	  LSN having been replayed on replica.
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 2020, Regents of PostgresPro
+ *
+ * IDENTIFICATION
+ *	  src/backend/commands/wait.c
+ *
+ *-
+ */
+#include "postgres.h"
+
+#include 
+
+#include "access/xact.h"
+#include "access/xlogrecovery.h"
+#include "access/xlogdefs.h"
+#include "commands/wait.h"
+#include "funcapi.h"
+#include "miscadmin.h"
+#include "pgstat.h"
+#include "storage/backendid.h"
+#include "storage/pmsignal.h"
+#include "storage/proc.h"
+#include "storage/shmem.h"
+#include "storage/sinvaladt.h"
+#include "storage/spin.h"
+#include "utils/builtins.h"
+#include "utils/pg_lsn.h"
+#include "utils/timestamp.h"
+
+/* Add to shared memory array */
+static void AddWaitedLSN(XLogRecPtr lsn_to_wait);
+
+/* Shared memory structure */
+typedef struct
+{
+	int			backend_maxid;
+	pg_atomic_uint64 min_lsn; /* XLogRecPtr of minimal waited for LSN */
+	slock_t		mutex;
+	/* LSNs that

Re: [HACKERS] make async slave to wait for lsn to be replayed

2023-03-02 Thread Peter Eisentraut


On 28.02.23 11:10, Kartyshov Ivan wrote:

3) Procedure style: Tom Lane and Kyotaro (wait_proc_v1.patch)
https://www.postgresql.org/message-id/27171.1586439221%40sss.pgh.pa.us
https://www.postgresql.org/message-id/20210121.173009.235021120161403875.horikyota.ntt%40gmail.com
==
advantages: no new words in grammar,like it made in 
pg_last_wal_replay_lsn, no snapshots need

disadvantages: a little harder to remember names
SELECT pg_waitlsn(‘LSN’, timeout);
SELECT pg_waitlsn_infinite(‘LSN’);
SELECT pg_waitlsn_no_wait(‘LSN’);


Of the presented options, I prefer this one.  (Maybe with a "_" between 
"wait" and "lsn".)


But I wonder how a client is going to get the LSN.  How would all of 
this be used by a client?  I can think of a scenarios where you have an 
application that issues a bunch of SQL commands and you have some kind 
of pooler in the middle that redirects those commands to different 
hosts, and what you really want is to have it transparently behave as if 
it's just a single host.  Do we want to inject a bunch of "SELECT 
pg_get_lsn()", "SELECT pg_wait_lsn()" calls into that?


I'm tempted to think this could be a protocol-layer facility.  Every 
query automatically returns the current LSN, and every query can also 
send along an LSN to wait for, and the client library would just keep 
track of the LSN for (what it thinks of as) the connection.  So you get 
some automatic serialization without having to modify your client code.


That said, exposing this functionality using functions could be a valid 
step in that direction, so that you can at least build out the actual 
internals of the functionality and test it out.

Re: [HACKERS] make async slave to wait for lsn to be replayed

2023-03-01 Thread Michael Paquier

On Wed, Mar 01, 2023 at 03:31:06PM -0500, Greg Stark wrote:
> Fwiw I prefer the functions approach. I do like me some nice syntax
> but I don't see any particular advantage of the special syntax in this
> case. They don't seem to provide any additional expressiveness.

So do I, eventhough I saw a point that sticking to a function or a
procedure approach makes the wait stick with more MVCC rules, like the
fact that the wait may be holding a snapshot for longer than
necessary.  The grammar can be more extensible without more keywords
with DefElems, still I'd like to think that we should not introduce
more restrictions in the parser if we have ways to work around it.
Using a procedure or function approach is more extensible in its own
ways, and it also depends on the data waiting for (there could be more
than one field as well for a single wait pattern?).

> I'll set the patch to "Waiting on Author" for now. If you feel you're
> still looking for more opinions from others maybe set it back to Needs
> Review but honestly there are a lot of patches so you probably won't
> see much this commitfest unless you have a patch that shows in
> cfbot.cputube.org as applying and which looks ready to commit.

While looking at all the patches proposed, I have noticed that all the
approaches proposed force a wakeup of the waiters in the redo loop of
the startup process for each record, before reading the next record.
It strikes me that there is some interaction with custom resource
managers here, where it is possible to poke at the waiters not for
each record, but after reading some specific records.  Something
out-of-core would not be as responsive as the per-record approach,
still responsive enough that the waiters wait on input for an
acceptable amount of time, depending on the frequency of the records
generated by a primary to wake them up.  Just something that popped
into my mind while looking a bit at the threads.
--
Michael


signature.asc
Description: PGP signature

Re: [HACKERS] make async slave to wait for lsn to be replayed

2023-03-01 Thread Greg Stark

On Tue, 28 Feb 2023 at 05:13, Kartyshov Ivan  wrote:
>
> Below I provide the implementation of patches for the first three types.
> I propose to discuss this feature again/

Oof, that doesn't really work with the cfbot. It tries to apply all
three patches and of course the second and third fail to apply.

In any case this seems like a lot of effort to me. I would suggest you
just pick one avenue and provide that patch for discussion and just
ask whether people would prefer any of the alternative syntaxes.

Fwiw I prefer the functions approach. I do like me some nice syntax
but I don't see any particular advantage of the special syntax in this
case. They don't seem to provide any additional expressiveness.

That said, I'm not a fan of the specific function names. Remember that
we have polymorphic functions so you could probably just have an
option argument:

pg_lsn_wait('LSN', [timeout]) returns boolean

(just call it with a timeout of 0 to do a no-wait)

I'll set the patch to "Waiting on Author" for now. If you feel you're
still looking for more opinions from others maybe set it back to Needs
Review but honestly there are a lot of patches so you probably won't
see much this commitfest unless you have a patch that shows in
cfbot.cputube.org as applying and which looks ready to commit.

-- 
greg

Re: [HACKERS] make async slave to wait for lsn to be replayed

2023-02-28 Thread Kartyshov Ivan


Intro==
The main purpose of the feature is to achieve
read-your-writes-consistency, while using async replica for reads and
primary for writes. In that case lsn of last modification is stored 
inside

application. We cannot store this lsn inside database, since reads are
distributed across all replicas and primary.

https://www.postgresql.org/message-id/195e2d07ead315b1620f1a053313f490%40postgrespro.ru

Suggestions
==
Lots of proposals were made how this feature may look like.
I aggregate them into the following four types.

1) Classic (wait_classic_v1.patch)
https://www.postgresql.org/message-id/3cc883048264c2e9af022033925ff8db%40postgrespro.ru
==
advantages: multiple events, standalone WAIT
disadvantages: new words in grammar

WAIT FOR  [ANY | ALL] event [, ...]
BEGIN [ WORK | TRANSACTION ] [ transaction_mode [, ...] ]
    [ WAIT FOR [ANY | ALL] event [, ...]]
where event is one of:
LSN value
TIMEOUT number_of_milliseconds
timestamp

2) After style: Kyotaro and Freund (wait_after_within_v1.patch)
https://www.postgresql.org/message-id/d3ff2e363af60b345f82396992595a03%40postgrespro.ru
==
advantages: no new words in grammar, standalone AFTER
disadvantages: a little harder to understand

AFTER lsn_event [ WITHIN delay_milliseconds ] [, ...]
BEGIN [ WORK | TRANSACTION ] [ transaction_mode [, ...] ]
    [ AFTER lsn_event [ WITHIN delay_milliseconds ]]
START [ WORK | TRANSACTION ] [ transaction_mode [, ...] ]
    [ AFTER lsn_event [ WITHIN delay_milliseconds ]]

3) Procedure style: Tom Lane and Kyotaro (wait_proc_v1.patch)
https://www.postgresql.org/message-id/27171.1586439221%40sss.pgh.pa.us
https://www.postgresql.org/message-id/20210121.173009.235021120161403875.horikyota.ntt%40gmail.com
==
advantages: no new words in grammar,like it made in 
pg_last_wal_replay_lsn, no snapshots need

disadvantages: a little harder to remember names
SELECT pg_waitlsn(‘LSN’, timeout);
SELECT pg_waitlsn_infinite(‘LSN’);
SELECT pg_waitlsn_no_wait(‘LSN’);

4) Brackets style: Kondratov
https://www.postgresql.org/message-id/a8bff0350a27e0a87a6eaf0905d6737f%40postgrespro.ru
 
==
advantages: only one new word in grammar,like it made in VACUUM and 
REINDEX, ability to extend parameters without grammar fixes

disadvantages: 
WAIT (LSN '16/B374D848', TIMEOUT 100);
BEGIN WAIT (LSN '16/B374D848' [, etc_options]);
...
COMMIT;

Consequence
==
Below I provide the implementation of patches for the first three types.
I propose to discuss this feature again/

Regards

--
Ivan Kartyshov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Companydiff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml
index 54b5f22d6e..18695e013e 100644
--- a/doc/src/sgml/ref/allfiles.sgml
+++ b/doc/src/sgml/ref/allfiles.sgml
@@ -188,6 +188,7 @@ Complete list of usable sgml source files in this directory.
 
 
 
+
 
 
 
diff --git a/doc/src/sgml/ref/begin.sgml b/doc/src/sgml/ref/begin.sgml
index 016b021487..a2794763b1 100644
--- a/doc/src/sgml/ref/begin.sgml
+++ b/doc/src/sgml/ref/begin.sgml
@@ -21,13 +21,16 @@ PostgreSQL documentation
 
  
 
-BEGIN [ WORK | TRANSACTION ] [ transaction_mode [, ...] ]
+BEGIN [ WORK | TRANSACTION ] [ transaction_mode [, ...] ] wait_event
 
 where transaction_mode is one of:
 
 ISOLATION LEVEL { SERIALIZABLE | REPEATABLE READ | READ COMMITTED | READ UNCOMMITTED }
 READ WRITE | READ ONLY
 [ NOT ] DEFERRABLE
+
+where wait_event is:
+AFTER lsn_value [ WITHIN number_of_milliseconds ]
 
  
 
diff --git a/doc/src/sgml/ref/start_transaction.sgml b/doc/src/sgml/ref/start_transaction.sgml
index 74ccd7e345..46a3bcf1a8 100644
--- a/doc/src/sgml/ref/start_transaction.sgml
+++ b/doc/src/sgml/ref/start_transaction.sgml
@@ -21,13 +21,16 @@ PostgreSQL documentation
 
  
 
-START TRANSACTION [ transaction_mode [, ...] ]
+START TRANSACTION [ transaction_mode [, ...] ] wait_event
 
 where transaction_mode is one of:
 
 ISOLATION LEVEL { SERIALIZABLE | REPEATABLE READ | READ COMMITTED | READ UNCOMMITTED }
 READ WRITE | READ ONLY
 [ NOT ] DEFERRABLE
+
+where wait_event is:
+AFTER lsn_value [ WITHIN number_of_milliseconds ]
 
  
 
diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml
index e11b4b6130..a83ff4551e 100644
--- a/doc/src/sgml/reference.sgml
+++ b/doc/src/sgml/reference.sgml
@@ -216,6 +216,7 @@



+   
 
  
 
diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c
index dbe9394762..e4d2d8d0a1 100644
--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -43,6 +43,7 @@
 #include "backup/basebackup.h"
 #include "catalog/pg_control.h"
 #include "commands/tablespace.h"
+#include "commands/wait.h"
 #include "common/file_utils.h"
 #include "miscadmin.h"
 #include "pgstat.h"
@@ -1752,6 +1753,15 @@ PerformWalRecovery(void)
 break;
 			}
 
+			/*
+			* If we replayed an LSN that someone was waiting for,
+			*

Re: [HACKERS] make async slave to wait for lsn to be replayed

2021-03-21 Thread Kyotaro Horiguchi

At Thu, 18 Mar 2021 18:57:15 +0500, Ibrar Ahmed  wrote 
in 
> On Thu, Jan 21, 2021 at 1:30 PM Kyotaro Horiguchi 
> wrote:
> 
> > Hello.
> >
> > At Wed, 18 Nov 2020 15:05:00 +0300, a.pervush...@postgrespro.ru wrote in
> > > I've changed the BEGIN WAIT FOR LSN statement to core functions
> > > pg_waitlsn, pg_waitlsn_infinite and pg_waitlsn_no_wait.
> > > Currently the functions work inside repeatable read transactions, but
> > > waitlsn creates a snapshot if called first in a transaction block,
> > > which can possibly lead the transaction to working incorrectly, so the
> > > function gives a warning.
> >
> > According to the discuttion here, implementing as functions is not
> > optimal. As a Poc, I made it as a procedure. However I'm not sure it
> > is the correct implement as a native procedure but it seems working as
> > expected.
> >
> > > Usage examples
> > > ==
> > > select pg_waitlsn(‘LSN’, timeout);
> > > select pg_waitlsn_infinite(‘LSN’);
> > > select pg_waitlsn_no_wait(‘LSN’);
> >
> > The first and second usage is coverd by a single procedure. The last
> > function is equivalent to pg_last_wal_replay_lsn(). As the result, the
> > following procedure is provided in the attached.
> >
> > pg_waitlsn(wait_lsn pg_lsn, timeout integer DEFAULT -1)
> >
> > Any opinions mainly compared to implementation as a command?
> >
> > regards.
> >
> > --
> > Kyotaro Horiguchi
> > NTT Open Source Software Center
> >
> 
> The patch (pg_waitlsn_v10_2_kh.patch) does not compile successfully and has
> compilation errors. Can you please take a look?
> 
> https://cirrus-ci.com/task/6241565996744704
> 
> xlog.c:45:10: fatal error: commands/wait.h: No such file or directory
> #include "commands/wait.h"
> ^
> compilation terminated.
> make[4]: *** [: xlog.o] Error 1
> make[4]: *** Waiting for unfinished jobs
> make[3]: *** [../../../src/backend/common.mk:39: transam-recursive] Error 2
> make[2]: *** [common.mk:39: access-recursive] Error 2
> make[1]: *** [Makefile:42: all-backend-recurse] Error 2
> make: *** [GNUmakefile:11: all-src-recurse] Error 2
> 
> I am changing the status to  "Waiting on Author"

Anna is the autor.  The "patch" was just to show how we can implement
the feature as a procedure. (Sorry for the bad mistake I made.)

The patch still applies to the master. So I resend just rebased
version as v10_2, and attached the "PoC" as *.txt which applies on top
of the patch.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 6f8810e149..3c580083dd 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -42,6 +42,7 @@
 #include "catalog/pg_database.h"
 #include "commands/progress.h"
 #include "commands/tablespace.h"
+#include "commands/wait.h"
 #include "common/controldata_utils.h"
 #include "executor/instrument.h"
 #include "miscadmin.h"
@@ -7535,6 +7536,15 @@ StartupXLOG(void)
 	break;
 }
 
+/*
+ * If we replayed an LSN that someone was waiting for,
+ * set latches in shared memory array to notify the waiter.
+ */
+if (XLogCtl->lastReplayedEndRecPtr >= GetMinWaitedLSN())
+{
+	WaitSetLatch(XLogCtl->lastReplayedEndRecPtr);
+}
+
 /* Else, try to fetch the next WAL record */
 record = ReadRecord(xlogreader, LOG, false);
 			} while (record != NULL);
diff --git a/src/backend/commands/Makefile b/src/backend/commands/Makefile
index e8504f0ae4..2c0bd41336 100644
--- a/src/backend/commands/Makefile
+++ b/src/backend/commands/Makefile
@@ -60,6 +60,7 @@ OBJS = \
 	user.o \
 	vacuum.o \
 	variable.o \
-	view.o
+	view.o \
+	wait.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/commands/wait.c b/src/backend/commands/wait.c
new file mode 100644
index 00..1f2483672b
--- /dev/null
+++ b/src/backend/commands/wait.c
@@ -0,0 +1,297 @@
+/*-
+ *
+ * wait.c
+ *	  Implements waitlsn, which allows waiting for events such as
+ *	  LSN having been replayed on replica.
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 2020, Regents of PostgresPro
+ *
+ * IDENTIFICATION
+ *	  src/backend/commands/wait.c
+ *
+ *-
+ */
+#include "postgres.h"
+
+#include 
+
+#include "access/xact.h"
+#include "access/xlog.h"
+#include "access/xlogdefs.h"
+#include "commands/wait.h"
+#include "funcapi.h"
+#include "miscadmin.h"
+#include "pgstat.h"
+#include "storage/backendid.h"
+#include "storage/pmsignal.h"
+#include "storage/proc.h"
+#include "storage/shmem.h"
+#include "storage/sinvaladt.h"
+#include "storage/spin.h"
+#include "utils/builtins.h"
+#include "utils/pg_lsn.h"
+#include "utils/timestamp.h"
+
+/* Add to shared memory array */
+static void AddWaitedLSN(XLogRecPtr lsn_to_wait);
+
+/* Shared memory structure */

Re: [HACKERS] make async slave to wait for lsn to be replayed

2021-03-18 Thread Ibrar Ahmed

On Thu, Jan 21, 2021 at 1:30 PM Kyotaro Horiguchi 
wrote:

> Hello.
>
> At Wed, 18 Nov 2020 15:05:00 +0300, a.pervush...@postgrespro.ru wrote in
> > I've changed the BEGIN WAIT FOR LSN statement to core functions
> > pg_waitlsn, pg_waitlsn_infinite and pg_waitlsn_no_wait.
> > Currently the functions work inside repeatable read transactions, but
> > waitlsn creates a snapshot if called first in a transaction block,
> > which can possibly lead the transaction to working incorrectly, so the
> > function gives a warning.
>
> According to the discuttion here, implementing as functions is not
> optimal. As a Poc, I made it as a procedure. However I'm not sure it
> is the correct implement as a native procedure but it seems working as
> expected.
>
> > Usage examples
> > ==
> > select pg_waitlsn(‘LSN’, timeout);
> > select pg_waitlsn_infinite(‘LSN’);
> > select pg_waitlsn_no_wait(‘LSN’);
>
> The first and second usage is coverd by a single procedure. The last
> function is equivalent to pg_last_wal_replay_lsn(). As the result, the
> following procedure is provided in the attached.
>
> pg_waitlsn(wait_lsn pg_lsn, timeout integer DEFAULT -1)
>
> Any opinions mainly compared to implementation as a command?
>
> regards.
>
> --
> Kyotaro Horiguchi
> NTT Open Source Software Center
>

The patch (pg_waitlsn_v10_2_kh.patch) does not compile successfully and has
compilation errors. Can you please take a look?

https://cirrus-ci.com/task/6241565996744704

xlog.c:45:10: fatal error: commands/wait.h: No such file or directory
#include "commands/wait.h"
^
compilation terminated.
make[4]: *** [: xlog.o] Error 1
make[4]: *** Waiting for unfinished jobs
make[3]: *** [../../../src/backend/common.mk:39: transam-recursive] Error 2
make[2]: *** [common.mk:39: access-recursive] Error 2
make[1]: *** [Makefile:42: all-backend-recurse] Error 2
make: *** [GNUmakefile:11: all-src-recurse] Error 2

I am changing the status to  "Waiting on Author"




-- 
Ibrar Ahmed

Re: [HACKERS] make async slave to wait for lsn to be replayed

2021-01-21 Thread Kyotaro Horiguchi

Hello.

At Wed, 18 Nov 2020 15:05:00 +0300, a.pervush...@postgrespro.ru wrote in 
> I've changed the BEGIN WAIT FOR LSN statement to core functions
> pg_waitlsn, pg_waitlsn_infinite and pg_waitlsn_no_wait.
> Currently the functions work inside repeatable read transactions, but
> waitlsn creates a snapshot if called first in a transaction block,
> which can possibly lead the transaction to working incorrectly, so the
> function gives a warning.

According to the discuttion here, implementing as functions is not
optimal. As a Poc, I made it as a procedure. However I'm not sure it
is the correct implement as a native procedure but it seems working as
expected.

> Usage examples
> ==
> select pg_waitlsn(‘LSN’, timeout);
> select pg_waitlsn_infinite(‘LSN’);
> select pg_waitlsn_no_wait(‘LSN’);

The first and second usage is coverd by a single procedure. The last
function is equivalent to pg_last_wal_replay_lsn(). As the result, the
following procedure is provided in the attached.

pg_waitlsn(wait_lsn pg_lsn, timeout integer DEFAULT -1)

Any opinions mainly compared to implementation as a command?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 470e113b33..4283b98eb4 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -42,6 +42,7 @@
 #include "catalog/pg_database.h"
 #include "commands/progress.h"
 #include "commands/tablespace.h"
+#include "commands/wait.h"
 #include "common/controldata_utils.h"
 #include "executor/instrument.h"
 #include "miscadmin.h"
@@ -7463,6 +7464,15 @@ StartupXLOG(void)
 	break;
 }
 
+/*
+ * If we replayed an LSN that someone was waiting for,
+ * set latches in shared memory array to notify the waiter.
+ */
+if (XLogCtl->lastReplayedEndRecPtr >= GetMinWaitedLSN())
+{
+	WaitSetLatch(XLogCtl->lastReplayedEndRecPtr);
+}
+
 /* Else, try to fetch the next WAL record */
 record = ReadRecord(xlogreader, LOG, false);
 			} while (record != NULL);
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index fa58afd9d7..c19d49e7a4 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1460,6 +1460,10 @@ LANGUAGE internal
 STRICT IMMUTABLE PARALLEL SAFE
 AS 'unicode_is_normalized';
 
+CREATE OR REPLACE PROCEDURE
+  pg_waitlsn(wait_lsn pg_lsn, timeout integer DEFAULT -1)
+  LANGUAGE internal AS 'pg_waitlsn';
+
 --
 -- The default permissions for functions mean that anyone can execute them.
 -- A number of functions shouldn't be executable by just anyone, but rather
diff --git a/src/backend/commands/Makefile b/src/backend/commands/Makefile
index e8504f0ae4..2c0bd41336 100644
--- a/src/backend/commands/Makefile
+++ b/src/backend/commands/Makefile
@@ -60,6 +60,7 @@ OBJS = \
 	user.o \
 	vacuum.o \
 	variable.o \
-	view.o
+	view.o \
+	wait.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index f9bbe97b50..959e96b7e0 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -23,6 +23,7 @@
 #include "access/syncscan.h"
 #include "access/twophase.h"
 #include "commands/async.h"
+#include "commands/wait.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "postmaster/autovacuum.h"
@@ -149,6 +150,7 @@ CreateSharedMemoryAndSemaphores(void)
 		size = add_size(size, BTreeShmemSize());
 		size = add_size(size, SyncScanShmemSize());
 		size = add_size(size, AsyncShmemSize());
+		size = add_size(size, WaitShmemSize());
 #ifdef EXEC_BACKEND
 		size = add_size(size, ShmemBackendArraySize());
 #endif
@@ -268,6 +270,11 @@ CreateSharedMemoryAndSemaphores(void)
 	SyncScanShmemInit();
 	AsyncShmemInit();
 
+	/*
+	 * Init array of events for the wait clause in shared memory
+	 */
+	WaitShmemInit();
+
 #ifdef EXEC_BACKEND
 
 	/*
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index c87ffc6549..2b4d73ba2f 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -38,6 +38,7 @@
 #include "access/transam.h"
 #include "access/twophase.h"
 #include "access/xact.h"
+#include "commands/wait.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "postmaster/autovacuum.h"
@@ -713,6 +714,9 @@ LockErrorCleanup(void)
 
 	AbortStrongLockAcquire();
 
+	/* If waitlsn was interrupted, then stop waiting for that LSN */
+	DeleteWaitedLSN();
+
 	/* Nothing to do if we weren't waiting for a lock */
 	if (lockAwaited == NULL)
 	{
diff --git a/src/backend/utils/adt/misc.c b/src/backend/utils/adt/misc.c
index 4096faff9a..90876da120 100644
--- a/src/backend/utils/adt/misc.c
+++ b/src/backend/utils/adt/misc.c
@@ -373,8 +373,6 @@ pg_sleep(PG_FUNCTION_ARGS)
 	 * less than the specified time when WaitLatch is terminated early by a
 	 * non-query-canceling signal such as SIGHUP.
 	 */
-#define GetNowFloat()

Re: [HACKERS] make async slave to wait for lsn to be replayed

2020-11-18 Thread a . pervushina


Hello,

I've changed the BEGIN WAIT FOR LSN statement to core functions 
pg_waitlsn, pg_waitlsn_infinite and pg_waitlsn_no_wait.
Currently the functions work inside repeatable read transactions, but 
waitlsn creates a snapshot if called first in a transaction block, which 
can possibly lead the transaction to working incorrectly, so the 
function gives a warning.


Usage examples
==
select pg_waitlsn(‘LSN’, timeout);
select pg_waitlsn_infinite(‘LSN’);
select pg_waitlsn_no_wait(‘LSN’);

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 9d3f1c12fc5..27e964c8111 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -42,6 +42,7 @@
 #include "catalog/pg_database.h"
 #include "commands/progress.h"
 #include "commands/tablespace.h"
+#include "commands/wait.h"
 #include "common/controldata_utils.h"
 #include "executor/instrument.h"
 #include "miscadmin.h"
@@ -7387,6 +7388,15 @@ StartupXLOG(void)
 	break;
 }
 
+/*
+ * If we replayed an LSN that someone was waiting for,
+ * set latches in shared memory array to notify the waiter.
+ */
+if (XLogCtl->lastReplayedEndRecPtr >= GetMinWaitedLSN())
+{
+	WaitSetLatch(XLogCtl->lastReplayedEndRecPtr);
+}
+
 /* Else, try to fetch the next WAL record */
 record = ReadRecord(xlogreader, LOG, false);
 			} while (record != NULL);
diff --git a/src/backend/commands/Makefile b/src/backend/commands/Makefile
index d4815d3ce65..9b310926c12 100644
--- a/src/backend/commands/Makefile
+++ b/src/backend/commands/Makefile
@@ -57,6 +57,7 @@ OBJS = \
 	user.o \
 	vacuum.o \
 	variable.o \
-	view.o
+	view.o \
+	wait.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/commands/wait.c b/src/backend/commands/wait.c
new file mode 100644
index 000..ac18040c362
--- /dev/null
+++ b/src/backend/commands/wait.c
@@ -0,0 +1,297 @@
+/*-
+ *
+ * wait.c
+ *	  Implements waitlsn, which allows waiting for events such as
+ *	  LSN having been replayed on replica.
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 2020, Regents of PostgresPro
+ *
+ * IDENTIFICATION
+ *	  src/backend/commands/wait.c
+ *
+ *-
+ */
+#include "postgres.h"
+
+#include 
+
+#include "access/xact.h"
+#include "access/xlog.h"
+#include "access/xlogdefs.h"
+#include "commands/wait.h"
+#include "funcapi.h"
+#include "miscadmin.h"
+#include "pgstat.h"
+#include "storage/backendid.h"
+#include "storage/pmsignal.h"
+#include "storage/proc.h"
+#include "storage/shmem.h"
+#include "storage/sinvaladt.h"
+#include "storage/spin.h"
+#include "utils/builtins.h"
+#include "utils/pg_lsn.h"
+#include "utils/timestamp.h"
+
+/* Add to shared memory array */
+static void AddWaitedLSN(XLogRecPtr lsn_to_wait);
+
+/* Shared memory structure */
+typedef struct
+{
+	int			backend_maxid;
+	pg_atomic_uint64 min_lsn; /* XLogRecPtr of minimal waited for LSN */
+	slock_t		mutex;
+	/* LSNs that different backends are waiting */
+	XLogRecPtr	lsn[FLEXIBLE_ARRAY_MEMBER];
+} WaitState;
+
+static WaitState *state;
+
+/*
+ * Add the wait event of the current backend to shared memory array
+ */
+static void
+AddWaitedLSN(XLogRecPtr lsn_to_wait)
+{
+	SpinLockAcquire(>mutex);
+	if (state->backend_maxid < MyBackendId)
+		state->backend_maxid = MyBackendId;
+
+	state->lsn[MyBackendId] = lsn_to_wait;
+
+	if (lsn_to_wait < state->min_lsn.value)
+		state->min_lsn.value = lsn_to_wait;
+	SpinLockRelease(>mutex);
+}
+
+/*
+ * Delete wait event of the current backend from the shared memory array.
+ */
+void
+DeleteWaitedLSN(void)
+{
+	int			i;
+	XLogRecPtr	lsn_to_delete;
+
+	SpinLockAcquire(>mutex);
+
+	lsn_to_delete = state->lsn[MyBackendId];
+	state->lsn[MyBackendId] = InvalidXLogRecPtr;
+
+	/* If we are deleting the minimal LSN, then choose the next min_lsn */
+	if (lsn_to_delete != InvalidXLogRecPtr &&
+		lsn_to_delete == state->min_lsn.value)
+	{
+		state->min_lsn.value = PG_UINT64_MAX;
+		for (i = 2; i <= state->backend_maxid; i++)
+			if (state->lsn[i] != InvalidXLogRecPtr &&
+state->lsn[i] < state->min_lsn.value)
+state->min_lsn.value = state->lsn[i];
+	}
+
+	/* If deleting from the end of the array, shorten the array's used part */
+	if (state->backend_maxid == MyBackendId)
+		for (i = (MyBackendId); i >= 2; i--)
+			if (state->lsn[i] != InvalidXLogRecPtr)
+			{
+state->backend_maxid = i;
+break;
+			}
+
+	SpinLockRelease(>mutex);
+}
+
+/*
+ * Report amount of shared memory space needed for WaitState
+ */
+Size
+WaitShmemSize(void)
+{
+	Size		size;
+
+	size = offsetof(WaitState, lsn);
+	size = add_size(size, mul_size(MaxBackends + 1, sizeof(XLogRecPtr)));
+	return size;
+}
+
+/*
+ * Initialize an array of events to wait for in shared memory
+ */
+void
+WaitShmemInit(void)
+{
+	bool		found;
+	uint32		i;

Re: [HACKERS] make async slave to wait for lsn to be replayed

2020-11-16 Thread Alexander Korotkov

Hi!

On Mon, Nov 16, 2020 at 1:09 PM  wrote:
> I've changed the BEGIN WAIT FOR LSN statement to core functions
> pg_waitlsn, pg_waitlsn_infinite and pg_waitlsn_no_wait.
> Currently the functions work inside repeatable read transactions, but
> waitlsn creates a snapshot if called first in a transaction block, which
> can possibly lead the transaction to working incorrectly, so the
> function gives a warning.
>
> Usage examples
> ==
> select pg_waitlsn(‘LSN’, timeout);
> select pg_waitlsn_infinite(‘LSN’);
> select pg_waitlsn_no_wait(‘LSN’);

The name pg_waitlsn_no_wait() looks confusing.  I've tried to see how
it's documented, but the patch doesn't contain documentation...

--
Regards,
Alexander Korotkov

Re: [HACKERS] make async slave to wait for lsn to be replayed

2020-11-16 Thread a . pervushina


Hello,

I've changed the BEGIN WAIT FOR LSN statement to core functions 
pg_waitlsn, pg_waitlsn_infinite and pg_waitlsn_no_wait.
Currently the functions work inside repeatable read transactions, but 
waitlsn creates a snapshot if called first in a transaction block, which 
can possibly lead the transaction to working incorrectly, so the 
function gives a warning.


Usage examples
==
select pg_waitlsn(‘LSN’, timeout);
select pg_waitlsn_infinite(‘LSN’);
select pg_waitlsn_no_wait(‘LSN’);diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 9d3f1c12fc5..27e964c8111 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -42,6 +42,7 @@
 #include "catalog/pg_database.h"
 #include "commands/progress.h"
 #include "commands/tablespace.h"
+#include "commands/wait.h"
 #include "common/controldata_utils.h"
 #include "executor/instrument.h"
 #include "miscadmin.h"
@@ -7387,6 +7388,15 @@ StartupXLOG(void)
 	break;
 }
 
+/*
+ * If we replayed an LSN that someone was waiting for,
+ * set latches in shared memory array to notify the waiter.
+ */
+if (XLogCtl->lastReplayedEndRecPtr >= GetMinWaitedLSN())
+{
+	WaitSetLatch(XLogCtl->lastReplayedEndRecPtr);
+}
+
 /* Else, try to fetch the next WAL record */
 record = ReadRecord(xlogreader, LOG, false);
 			} while (record != NULL);
diff --git a/src/backend/commands/Makefile b/src/backend/commands/Makefile
index d4815d3ce65..9b310926c12 100644
--- a/src/backend/commands/Makefile
+++ b/src/backend/commands/Makefile
@@ -57,6 +57,7 @@ OBJS = \
 	user.o \
 	vacuum.o \
 	variable.o \
-	view.o
+	view.o \
+	wait.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/commands/wait.c b/src/backend/commands/wait.c
new file mode 100644
index 000..ac18040c362
--- /dev/null
+++ b/src/backend/commands/wait.c
@@ -0,0 +1,297 @@
+/*-
+ *
+ * wait.c
+ *	  Implements waitlsn, which allows waiting for events such as
+ *	  LSN having been replayed on replica.
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 2020, Regents of PostgresPro
+ *
+ * IDENTIFICATION
+ *	  src/backend/commands/wait.c
+ *
+ *-
+ */
+#include "postgres.h"
+
+#include 
+
+#include "access/xact.h"
+#include "access/xlog.h"
+#include "access/xlogdefs.h"
+#include "commands/wait.h"
+#include "funcapi.h"
+#include "miscadmin.h"
+#include "pgstat.h"
+#include "storage/backendid.h"
+#include "storage/pmsignal.h"
+#include "storage/proc.h"
+#include "storage/shmem.h"
+#include "storage/sinvaladt.h"
+#include "storage/spin.h"
+#include "utils/builtins.h"
+#include "utils/pg_lsn.h"
+#include "utils/timestamp.h"
+
+/* Add to shared memory array */
+static void AddWaitedLSN(XLogRecPtr lsn_to_wait);
+
+/* Shared memory structure */
+typedef struct
+{
+	int			backend_maxid;
+	pg_atomic_uint64 min_lsn; /* XLogRecPtr of minimal waited for LSN */
+	slock_t		mutex;
+	/* LSNs that different backends are waiting */
+	XLogRecPtr	lsn[FLEXIBLE_ARRAY_MEMBER];
+} WaitState;
+
+static WaitState *state;
+
+/*
+ * Add the wait event of the current backend to shared memory array
+ */
+static void
+AddWaitedLSN(XLogRecPtr lsn_to_wait)
+{
+	SpinLockAcquire(>mutex);
+	if (state->backend_maxid < MyBackendId)
+		state->backend_maxid = MyBackendId;
+
+	state->lsn[MyBackendId] = lsn_to_wait;
+
+	if (lsn_to_wait < state->min_lsn.value)
+		state->min_lsn.value = lsn_to_wait;
+	SpinLockRelease(>mutex);
+}
+
+/*
+ * Delete wait event of the current backend from the shared memory array.
+ */
+void
+DeleteWaitedLSN(void)
+{
+	int			i;
+	XLogRecPtr	lsn_to_delete;
+
+	SpinLockAcquire(>mutex);
+
+	lsn_to_delete = state->lsn[MyBackendId];
+	state->lsn[MyBackendId] = InvalidXLogRecPtr;
+
+	/* If we are deleting the minimal LSN, then choose the next min_lsn */
+	if (lsn_to_delete != InvalidXLogRecPtr &&
+		lsn_to_delete == state->min_lsn.value)
+	{
+		state->min_lsn.value = PG_UINT64_MAX;
+		for (i = 2; i <= state->backend_maxid; i++)
+			if (state->lsn[i] != InvalidXLogRecPtr &&
+state->lsn[i] < state->min_lsn.value)
+state->min_lsn.value = state->lsn[i];
+	}
+
+	/* If deleting from the end of the array, shorten the array's used part */
+	if (state->backend_maxid == MyBackendId)
+		for (i = (MyBackendId); i >= 2; i--)
+			if (state->lsn[i] != InvalidXLogRecPtr)
+			{
+state->backend_maxid = i;
+break;
+			}
+
+	SpinLockRelease(>mutex);
+}
+
+/*
+ * Report amount of shared memory space needed for WaitState
+ */
+Size
+WaitShmemSize(void)
+{
+	Size		size;
+
+	size = offsetof(WaitState, lsn);
+	size = add_size(size, mul_size(MaxBackends + 1, sizeof(XLogRecPtr)));
+	return size;
+}
+
+/*
+ * Initialize an array of events to wait for in shared memory
+ */
+void
+WaitShmemInit(void)
+{
+	bool		found;
+	uint32		i;
+

Re: [HACKERS] make async slave to wait for lsn to be replayed

2020-10-02 Thread a . pervushina


Anna Akenteva писал 2020-04-08 22:36:

On 2020-04-08 04:09, Kyotaro Horiguchi wrote:

I like your suggested keywords! I think that "AFTER" + "WITHIN" sound
the most natural. We could completely give up the LSN keyword for now.
The final command could look something like:

BEGIN AFTER ‘0/303EC60’ WITHIN '5 seconds';
or
BEGIN AFTER ‘0/303EC60’ WITHIN 5000;



Hello,

I've changed the syntax of the command from BEGIN [ WAIT FOR LSN value [ 
TIMEOUT delay ]] to BEGIN [ AFTER value [ WITHIN delay ]] and removed 
all the unnecessary keywords.


Best regards,
Alexandra Pervushina.diff --git a/doc/src/sgml/ref/begin.sgml b/doc/src/sgml/ref/begin.sgml
index 53eac922645..c5bf1a99039 100644
--- a/doc/src/sgml/ref/begin.sgml
+++ b/doc/src/sgml/ref/begin.sgml
@@ -21,7 +21,7 @@ doc/src/sgml/ref/begin.sgml
 
  
 
-BEGIN [ WORK | TRANSACTION ] [ transaction_mode [, ...] ]
+BEGIN [ WORK | TRANSACTION ] [ transaction_mode [, ...] ] [AFTER lsn_value [WITHIN number_of_milliseconds ] ]
 
 where transaction_mode is one of:
 
@@ -63,6 +63,17 @@ BEGIN [ WORK | TRANSACTION ] [ transaction_mode
was executed.
   
+
+  
+   The BEGIN AFTER clause allows to wait for the target log
+   sequence number (LSN) to be replayed on standby before
+   starting the transaction in PostgreSQL databases
+   with primary-standby asynchronous replication. Wait time can be limited by
+   specifying a timeout, which is measured in milliseconds and must be a positive
+   integer. If LSN was not reached before timeout, transaction
+   doesn't begin. Waiting can be interrupted using Ctrl+C, or
+   by shutting down the postgres server.
+  
  
 
  
@@ -146,6 +157,10 @@ BEGIN;
different purpose in embedded SQL. You are advised to be careful
about the transaction semantics when porting database applications.
   
+
+  
+   There is no AFTER clause in the SQL standard.
+  
  
 
  
diff --git a/doc/src/sgml/ref/start_transaction.sgml b/doc/src/sgml/ref/start_transaction.sgml
index d5dfa413aeb..ea9b0b43fd0 100644
--- a/doc/src/sgml/ref/start_transaction.sgml
+++ b/doc/src/sgml/ref/start_transaction.sgml
@@ -21,7 +21,7 @@ doc/src/sgml/ref/start_transaction.sgml
 
  
 
-START TRANSACTION [ transaction_mode [, ...] ]
+START TRANSACTION [ transaction_mode [, ...] ] [ AFTER lsn_value [WITHIN number_of_milliseconds ] ]
 
 where transaction_mode is one of:
 
@@ -40,6 +40,17 @@ START TRANSACTION [ transaction_mode was executed. This is the same
as the  command.
   
+
+  
+   The WAIT FOR clause allows to wait for the target log
+   sequence number (LSN) to be replayed on standby before
+   starting the transaction in PostgreSQL databases
+   with primary-standby asynchronous replication. Wait time can be limited by
+   specifying a timeout, which is measured in milliseconds and must be a positive
+   integer. If LSN was not reached before timeout, transaction
+   doesn't begin. Waiting can be interrupted using Ctrl+C, or
+   by shutting down the postgres server.
+  
  
 
  
@@ -78,6 +89,10 @@ START TRANSACTION [ transaction_mode
 
+  
+   There is no WAIT FOR clause in the SQL standard.
+  
+
   
See also the compatibility section of .
   
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 9d3f1c12fc5..27e964c8111 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -42,6 +42,7 @@
 #include "catalog/pg_database.h"
 #include "commands/progress.h"
 #include "commands/tablespace.h"
+#include "commands/wait.h"
 #include "common/controldata_utils.h"
 #include "executor/instrument.h"
 #include "miscadmin.h"
@@ -7387,6 +7388,15 @@ StartupXLOG(void)
 	break;
 }
 
+/*
+ * If we replayed an LSN that someone was waiting for,
+ * set latches in shared memory array to notify the waiter.
+ */
+if (XLogCtl->lastReplayedEndRecPtr >= GetMinWaitedLSN())
+{
+	WaitSetLatch(XLogCtl->lastReplayedEndRecPtr);
+}
+
 /* Else, try to fetch the next WAL record */
 record = ReadRecord(xlogreader, LOG, false);
 			} while (record != NULL);
diff --git a/src/backend/commands/Makefile b/src/backend/commands/Makefile
index d4815d3ce65..9b310926c12 100644
--- a/src/backend/commands/Makefile
+++ b/src/backend/commands/Makefile
@@ -57,6 +57,7 @@ OBJS = \
 	user.o \
 	vacuum.o \
 	variable.o \
-	view.o
+	view.o \
+	wait.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/commands/wait.c b/src/backend/commands/wait.c
new file mode 100644
index 000..84195e2a7df
--- /dev/null
+++ b/src/backend/commands/wait.c
@@ -0,0 +1,282 @@
+/*-
+ *
+ * wait.c
+ *	  Implements BEGIN AFTER, which allows waiting for events such as
+ *	  LSN having been replayed on replica.
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 2020, Regents of PostgresPro
+ *
+ * IDENTIFICATION
+ *	  src/backend/commands/wait.c
+ *
+

Re: [HACKERS] make async slave to wait for lsn to be replayed

2020-09-23 Thread Michael Paquier

On Tue, Aug 18, 2020 at 01:12:51PM +0300, Anna Akenteva wrote:
> I updated the most recent patch and removed the use of "master" from it,
> replacing it with "primary".

This is failing to apply lately, causing the CF bot to complain:
http://cfbot.cputube.org/patch_29_772.log
--
Michael


signature.asc
Description: PGP signature

1 2 >

1 - 100 of 160 matches

Mail list logo