Re: Avoid orphaned objects dependencies, take 3
Hi, On Thu, May 23, 2024 at 02:10:54PM -0400, Robert Haas wrote: > On Thu, May 23, 2024 at 12:19 AM Bertrand Drouvot > wrote: > > The reason why we are using a dirty snapshot here is for the cases where we > > are > > recording a dependency on a referenced object that we are creating at the > > same > > time behind the scene (for example, creating a composite type while creating > > a relation). Without the dirty snapshot, then the object we are creating > > behind > > the scene (the composite type) would not be visible and we would wrongly > > assume > > that it has been dropped. > > The usual reason for using a dirty snapshot is that you want to see > uncommitted work by other transactions. It sounds like you're saying > you just need to see uncommitted work by the same transaction. Right. > If that's true, I think using HeapTupleSatisfiesSelf would be clearer. Oh thanks! I did not know about the SNAPSHOT_SELF snapshot type (I should have check all the snapshot types first though) and that's exactly what is needed here. Please find attached v8 making use of SnapshotSelf instead of a dirty snapshot. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com >From 2611fb874a476c1a8d665f97d973193beb70292a Mon Sep 17 00:00:00 2001 From: Bertrand Drouvot Date: Fri, 29 Mar 2024 15:43:26 + Subject: [PATCH v8] Avoid orphaned objects dependencies It's currently possible to create orphaned objects dependencies, for example: Scenario 1: session 1: begin; drop schema schem; session 2: create a function in the schema schem session 1: commit; With the above, the function created in session 2 would be linked to a non existing schema. Scenario 2: session 1: begin; create a function in the schema schem session 2: drop schema schem; session 1: commit; With the above, the function created in session 1 would be linked to a non existing schema. To avoid those scenarios, a new lock (that conflicts with a lock taken by DROP) has been put in place when the dependencies are being recorded. With this in place, the drop schema in scenario 2 would be locked. Also, after the new lock attempt, the patch checks that the object still exists: with this in place session 2 in scenario 1 would be locked and would report an error once session 1 committs (that would not be the case should session 1 abort the transaction). If the object is dropped before the new lock attempt is triggered then the patch would also report an error (but with less details). The patch adds a few tests for some dependency cases (that would currently produce orphaned objects): - schema and function (as the above scenarios) - alter a dependency (function and schema) - function and arg type - function and return type - function and function - domain and domain - table and type - server and foreign data wrapper --- src/backend/catalog/dependency.c | 59 src/backend/catalog/objectaddress.c | 66 + src/backend/catalog/pg_depend.c | 12 ++ src/backend/utils/errcodes.txt| 1 + src/include/catalog/dependency.h | 1 + src/include/catalog/objectaddress.h | 1 + .../expected/test_dependencies_locks.out | 129 ++ src/test/isolation/isolation_schedule | 1 + .../specs/test_dependencies_locks.spec| 89 9 files changed, 359 insertions(+) 24.3% src/backend/catalog/ 45.3% src/test/isolation/expected/ 28.2% src/test/isolation/specs/ diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c index d4b5b2ade1..4271ed7c5b 100644 --- a/src/backend/catalog/dependency.c +++ b/src/backend/catalog/dependency.c @@ -1517,6 +1517,65 @@ AcquireDeletionLock(const ObjectAddress *object, int flags) } } +/* + * depLockAndCheckObject + * + * Lock the object that we are about to record a dependency on. + * After it's locked, verify that it hasn't been dropped while we + * weren't looking. If the object has been dropped, this function + * does not return! + */ +void +depLockAndCheckObject(const ObjectAddress *object) +{ + char *object_description; + + /* + * Those don't rely on LockDatabaseObject() when being dropped (see + * AcquireDeletionLock()). Also it looks like they can not produce + * orphaned dependent objects when being dropped. + */ + if (object->classId == RelationRelationId || object->classId == AuthMemRelationId) + return; + + object_description = getObjectDescription(object, true); + + /* assume we should lock the whole object not a sub-object */ + LockDatabaseObject(object->classId, object->objectId, 0, AccessShareLock); + + /* check if object still exists */ + if (!ObjectByIdExist(object, false)) + { + /* + * It might be possible that we are creating it (for example creating + * a compo
Re: PostgreSQL 17 Beta 1 release announcement draft
Hi, On Wed, May 22, 2024 at 07:01:54PM -0400, Jonathan S. Katz wrote: > Thanks. As such I made it: > > "which provides descriptions about wait events and can be combined with > `pg_stat_activity` to give more insight into why an active session is > waiting." > Thanks! Works for me. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Avoid orphaned objects dependencies, take 3
Hi, On Wed, May 22, 2024 at 10:48:12AM -0400, Robert Haas wrote: > On Wed, May 22, 2024 at 6:21 AM Bertrand Drouvot > wrote: > > I started initially with [1] but it was focusing on function-schema only. > > Yeah, that's what I thought we would want to do. And then just extend > that to the other cases. > > > Then I proposed [2] making use of a dirty snapshot when recording the > > dependency. > > But this approach appeared to be "scary" and it was still failing to close > > some race conditions. > > The current patch still seems to be using dirty snapshots for some > reason, which struck me as a bit odd. My intuition is that if we're > relying on dirty snapshots to solve problems, we likely haven't solved > the problems correctly, which seems consistent with your statement > about "failing to close some race conditions". But I don't think I > understand the situation well enough to be sure just yet. The reason why we are using a dirty snapshot here is for the cases where we are recording a dependency on a referenced object that we are creating at the same time behind the scene (for example, creating a composite type while creating a relation). Without the dirty snapshot, then the object we are creating behind the scene (the composite type) would not be visible and we would wrongly assume that it has been dropped. Note that the usage of the dirty snapshot is only when the object is first reported as "non existing" by the new ObjectByIdExist() function. > > I think there is 2 cases here: > > > > First case: the "conflicting" lock mode is for one of our own lock then > > LockCheckConflicts() > > would report this as a NON conflict. > > > > Second case: the "conflicting" lock mode is NOT for one of our own lock > > then LockCheckConflicts() > > would report a conflict. But I've the feeling that the existing code would > > already lock those sessions. > > > > Was your concern about "First case" or "Second case" or both? > > The second case, I guess. It's bad to take a weaker lock first and > then a stronger lock on the same object later, because it can lead to > deadlocks that would have been avoided if the stronger lock had been > taken at the outset. In the example I shared up-thread that would be the opposite: the Session 1 would take an AccessExclusiveLock lock on the object before taking an AccessShareLock during changeDependencyFor(). > Here, it seems like things would happen in the > other order: if we took two locks, we'd probably take the stronger > lock in the higher-level code and then the weaker lock in the > dependency code. Yeah, I agree. > That shouldn't break anything; it's just a bit > inefficient. Yeah, the second lock is useless in that case (like in the example up-thread). > My concern was really more about the maintainability of > the code. I fear that if we add code that takes heavyweight locks in > surprising places, we might later find the behavior difficult to > reason about. > I think I understand your concern about code maintainability but I'm not sure that adding locks while recording a dependency is that surprising. > Tom, what is your thought about that concern? +1 Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Avoid orphaned objects dependencies, take 3
Hi, On Tue, May 21, 2024 at 08:53:06AM -0400, Robert Haas wrote: > On Wed, May 15, 2024 at 6:31 AM Bertrand Drouvot > wrote: > > Please find attached v6 (only diff with v5 is moving the tests as suggested > > above). > > I don't immediately know what to think about this patch. Thanks for looking at it! > I've known about this issue for a long time, but I didn't think this was how > we > would fix it. I started initially with [1] but it was focusing on function-schema only. Then I proposed [2] making use of a dirty snapshot when recording the dependency. But this approach appeared to be "scary" and it was still failing to close some race conditions. Then, Tom proposed another approach in [2] which is that "creation DDL will have to take a lock on each referenced object that'd conflict with a lock taken by DROP". This is the one the current patch is trying to implement. > What you've done here instead is add locking at a much > lower level - whenever we are adding a dependency on an object, we > lock the object. The advantage of that approach is that we definitely > won't miss anything. Right, as there is much more than the ones related to schemas, for example: - function and arg type - function and return type - function and function - domain and domain - table and type - server and foreign data wrapper to name a few. > The disadvantage of that approach is that it > means we have some very low-level code taking locks, which means it's > not obvious which operations are taking what locks. Right, but the new operations are "only" the ones leading to recording or altering a dependency. > Maybe it could > even result in some redundancy, like the higher-level code taking a > lock also (possibly in a different mode) and then this code taking > another one. The one that is added here is in AccessShareLock mode. It could conflict with the ones in AccessExclusiveLock means (If I'm not missing any): - AcquireDeletionLock(): which is exactly what we want - get_object_address() - get_object_address_rv() - ExecAlterObjectDependsStmt() - ExecRenameStmt() - ExecAlterObjectDependsStmt() - ExecAlterOwnerStmt() - RemoveObjects() - AlterPublication() I think there is 2 cases here: First case: the "conflicting" lock mode is for one of our own lock then LockCheckConflicts() would report this as a NON conflict. Second case: the "conflicting" lock mode is NOT for one of our own lock then LockCheckConflicts() would report a conflict. But I've the feeling that the existing code would already lock those sessions. One example where it would be the case: Session 1: doing "BEGIN; ALTER FUNCTION noschemas.foo2() SET SCHEMA alterschema" would acquire the lock in AccessExclusiveLock during ExecAlterObjectSchemaStmt()->get_object_address()->LockDatabaseObject() (in the existing code and before the new call that would occur through changeDependencyFor()->depLockAndCheckObject() with the patch in place). Then, session 2: doing "alter function noschemas.foo2() owner to newrole;" would be locked in the existing code while doing ExecAlterOwnerStmt()->get_object_address()->LockDatabaseObject()). Means that in this example, the new lock this patch is introducing would not be responsible of session 2 beging locked. Was your concern about "First case" or "Second case" or both? [1]: https://www.postgresql.org/message-id/flat/5a9daaae-5538-b209-6279-e903c3ea2157%40amazon.com [2]: https://www.postgresql.org/message-id/flat/8369ff70-0e31-f194-2954-787f4d9e21dd%40amazon.com Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Avoid orphaned objects dependencies, take 3
Hi, On Sun, May 19, 2024 at 11:00:00AM +0300, Alexander Lakhin wrote: > Hello Bertrand, > > Probably, it's worth to avoid ERRCODE_INTERNAL_ERROR here in light of [1] > and [2], as these errors are not that abnormal (not Assert-like). > > [1] https://www.postgresql.org/message-id/Zic_GNgos5sMxKoa%40paquier.xyz > [2] https://commitfest.postgresql.org/48/4735/ > Thanks for mentioning the above examples, I agree that it's worth to avoid ERRCODE_INTERNAL_ERROR here: please find attached v7 that makes use of a new ERRCODE: ERRCODE_DEPENDENT_OBJECTS_DOES_NOT_EXIST I thought about this name as it is close enough to the already existing "ERRCODE_DEPENDENT_OBJECTS_STILL_EXIST" but I'm open to suggestion too. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com >From 79809f88311ef1cf4e8f3250e1508fe0b7d86602 Mon Sep 17 00:00:00 2001 From: Bertrand Drouvot Date: Fri, 29 Mar 2024 15:43:26 + Subject: [PATCH v7] Avoid orphaned objects dependencies It's currently possible to create orphaned objects dependencies, for example: Scenario 1: session 1: begin; drop schema schem; session 2: create a function in the schema schem session 1: commit; With the above, the function created in session 2 would be linked to a non existing schema. Scenario 2: session 1: begin; create a function in the schema schem session 2: drop schema schem; session 1: commit; With the above, the function created in session 1 would be linked to a non existing schema. To avoid those scenarios, a new lock (that conflicts with a lock taken by DROP) has been put in place when the dependencies are being recorded. With this in place, the drop schema in scenario 2 would be locked. Also, after the new lock attempt, the patch checks that the object still exists: with this in place session 2 in scenario 1 would be locked and would report an error once session 1 committs (that would not be the case should session 1 abort the transaction). If the object is dropped before the new lock attempt is triggered then the patch would also report an error (but with less details). The patch adds a few tests for some dependency cases (that would currently produce orphaned objects): - schema and function (as the above scenarios) - alter a dependency (function and schema) - function and arg type - function and return type - function and function - domain and domain - table and type - server and foreign data wrapper --- src/backend/catalog/dependency.c | 58 src/backend/catalog/objectaddress.c | 70 ++ src/backend/catalog/pg_depend.c | 12 ++ src/backend/utils/errcodes.txt| 1 + src/include/catalog/dependency.h | 1 + src/include/catalog/objectaddress.h | 1 + .../expected/test_dependencies_locks.out | 129 ++ src/test/isolation/isolation_schedule | 1 + .../specs/test_dependencies_locks.spec| 89 9 files changed, 362 insertions(+) 24.6% src/backend/catalog/ 45.1% src/test/isolation/expected/ 28.1% src/test/isolation/specs/ diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c index d4b5b2ade1..89b2f18f98 100644 --- a/src/backend/catalog/dependency.c +++ b/src/backend/catalog/dependency.c @@ -1517,6 +1517,64 @@ AcquireDeletionLock(const ObjectAddress *object, int flags) } } +/* + * depLockAndCheckObject + * + * Lock the object that we are about to record a dependency on. + * After it's locked, verify that it hasn't been dropped while we + * weren't looking. If the object has been dropped, this function + * does not return! + */ +void +depLockAndCheckObject(const ObjectAddress *object) +{ + char *object_description; + + /* + * Those don't rely on LockDatabaseObject() when being dropped (see + * AcquireDeletionLock()). Also it looks like they can not produce + * orphaned dependent objects when being dropped. + */ + if (object->classId == RelationRelationId || object->classId == AuthMemRelationId) + return; + + object_description = getObjectDescription(object, true); + + /* assume we should lock the whole object not a sub-object */ + LockDatabaseObject(object->classId, object->objectId, 0, AccessShareLock); + + /* check if object still exists */ + if (!ObjectByIdExist(object, false)) + { + /* + * It might be possible that we are creating it (for example creating + * a composite type while creating a relation), so bypass the syscache + * lookup and use a dirty snaphot instead to cover this scenario. + */ + if (!ObjectByIdExist(object, true)) + { + /* + * If the object has been dropped before we get a chance to get + * its description, then emit a generic error message. That looks + * like a good compromise over extra complexity. + */ + if (object_description) +ereport(ERROR, + (errcode(ERRCODE_DEPE
Re: PostgreSQL 17 Beta 1 release announcement draft
Hi, On Sun, May 19, 2024 at 05:10:10PM -0400, Jonathan S. Katz wrote: > On 5/16/24 1:15 AM, Bertrand Drouvot wrote: > > Hi, > > > > On Wed, May 15, 2024 at 09:45:35PM -0400, Jonathan S. Katz wrote: > > > Hi, > > > > > > Attached is a copy of the PostgreSQL 17 Beta 1 release announcement draft. > > > > Thanks for working on it! > > > > I've one comment: > > > > > PostgreSQL 17 also introduces a new view, > > > [`pg_wait_events`](https://www.postgresql.org/docs/17/view-pg-wait-events.html), > > > which provides descriptions about wait events and can be combined with > > > `pg_stat_activity` to give more insight into an operation. > > > > Instead of "to give more insight into an operation", what about "to give > > more > > insight about what a session is waiting for (should it be active)"? > > I put: > > "to give more in insight into why a session is blocked." Thanks! > > Does that work? > I think using "waiting" is better (as the view is "pg_wait_events" and the join with pg_stat_activity would be on the "wait_event_type" and "wait_event" columns). The reason I mentioned "should it be active" is because wait_event and wait_event_type could be non empty in pg_stat_activity while the session is not in an active state anymore (then not waiting). A right query would be like the one in [1]: " SELECT a.pid, a.wait_event, w.description FROM pg_stat_activity a JOIN pg_wait_events w ON (a.wait_event_type = w.type AND a.wait_event = w.name) WHERE a.wait_event is NOT NULL and a.state = 'active'; " means filtering on the "active" state too, and that's what the description proposal I made was trying to highlight. [1]: https://www.postgresql.org/docs/devel/monitoring-stats.html Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: PostgreSQL 17 Beta 1 release announcement draft
Hi, On Wed, May 15, 2024 at 09:45:35PM -0400, Jonathan S. Katz wrote: > Hi, > > Attached is a copy of the PostgreSQL 17 Beta 1 release announcement draft. Thanks for working on it! I've one comment: > PostgreSQL 17 also introduces a new view, > [`pg_wait_events`](https://www.postgresql.org/docs/17/view-pg-wait-events.html), > which provides descriptions about wait events and can be combined with > `pg_stat_activity` to give more insight into an operation. Instead of "to give more insight into an operation", what about "to give more insight about what a session is waiting for (should it be active)"? Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Avoid orphaned objects dependencies, take 3
Hi, On Wed, May 15, 2024 at 08:31:43AM +, Bertrand Drouvot wrote: > Hi, > > On Wed, May 15, 2024 at 10:14:09AM +0900, Michael Paquier wrote: > > +++ > > b/src/test/modules/test_dependencies_locks/specs/test_dependencies_locks.spec > > > > > > This introduces a module with only one single spec. I could get > > behind an extra module if splitting the tests into more specs makes > > sense or if there is a restriction on installcheck. However, for > > one spec file filed with a bunch of objects, and note that I'm OK to > > let this single spec grow more for this range of tests, it seems to me > > that this would be better under src/test/isolation/. > > Yeah, I was not sure about this one (the location is from take 2 mentioned > up-thread). I'll look at moving the tests to src/test/isolation/. Please find attached v6 (only diff with v5 is moving the tests as suggested above). Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com >From 59f7befd34b8aa4ba0483a100e85bacbc1a76707 Mon Sep 17 00:00:00 2001 From: Bertrand Drouvot Date: Fri, 29 Mar 2024 15:43:26 + Subject: [PATCH v6] Avoid orphaned objects dependencies It's currently possible to create orphaned objects dependencies, for example: Scenario 1: session 1: begin; drop schema schem; session 2: create a function in the schema schem session 1: commit; With the above, the function created in session 2 would be linked to a non existing schema. Scenario 2: session 1: begin; create a function in the schema schem session 2: drop schema schem; session 1: commit; With the above, the function created in session 1 would be linked to a non existing schema. To avoid those scenarios, a new lock (that conflicts with a lock taken by DROP) has been put in place when the dependencies are being recorded. With this in place, the drop schema in scenario 2 would be locked. Also, after the new lock attempt, the patch checks that the object still exists: with this in place session 2 in scenario 1 would be locked and would report an error once session 1 committs (that would not be the case should session 1 abort the transaction). If the object is dropped before the new lock attempt is triggered then the patch would also report an error (but with less details). The patch adds a few tests for some dependency cases (that would currently produce orphaned objects): - schema and function (as the above scenarios) - alter a dependency (function and schema) - function and arg type - function and return type - function and function - domain and domain - table and type - server and foreign data wrapper --- src/backend/catalog/dependency.c | 54 src/backend/catalog/objectaddress.c | 70 ++ src/backend/catalog/pg_depend.c | 12 ++ src/include/catalog/dependency.h | 1 + src/include/catalog/objectaddress.h | 1 + .../expected/test_dependencies_locks.out | 129 ++ src/test/isolation/isolation_schedule | 1 + .../specs/test_dependencies_locks.spec| 89 8 files changed, 357 insertions(+) 24.1% src/backend/catalog/ 45.9% src/test/isolation/expected/ 28.6% src/test/isolation/specs/ diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c index d4b5b2ade1..a49357bbe2 100644 --- a/src/backend/catalog/dependency.c +++ b/src/backend/catalog/dependency.c @@ -1517,6 +1517,60 @@ AcquireDeletionLock(const ObjectAddress *object, int flags) } } +/* + * depLockAndCheckObject + * + * Lock the object that we are about to record a dependency on. + * After it's locked, verify that it hasn't been dropped while we + * weren't looking. If the object has been dropped, this function + * does not return! + */ +void +depLockAndCheckObject(const ObjectAddress *object) +{ + char *object_description; + + /* + * Those don't rely on LockDatabaseObject() when being dropped (see + * AcquireDeletionLock()). Also it looks like they can not produce + * orphaned dependent objects when being dropped. + */ + if (object->classId == RelationRelationId || object->classId == AuthMemRelationId) + return; + + object_description = getObjectDescription(object, true); + + /* assume we should lock the whole object not a sub-object */ + LockDatabaseObject(object->classId, object->objectId, 0, AccessShareLock); + + /* check if object still exists */ + if (!ObjectByIdExist(object, false)) + { + /* + * It might be possible that we are creating it (for example creating + * a composite type while creating a relation), so bypass the syscache + * lookup and use a dirty snaphot instead to cover this scenario. + */ + if (!ObjectByIdExist(object, true)) + { + /* + * If the object has been dropped before we get a chance to get + * its description, then
Re: Avoid orphaned objects dependencies, take 3
Hi, On Tue, May 14, 2024 at 03:00:00PM +0300, Alexander Lakhin wrote: > Hi Bertrand, > > 09.05.2024 15:20, Bertrand Drouvot wrote: > > Oh I see, your test updates an existing dependency. v4 took care about > > brand new > > dependencies creation (recordMultipleDependencies()) but forgot to take care > > about changing an existing dependency (which is done in another code path: > > changeDependencyFor()). > > > > Please find attached v5 that adds: > > > > - a call to the new depLockAndCheckObject() function in > > changeDependencyFor(). > > - a test when altering an existing dependency. > > > > With v5 applied, I don't see the issue anymore. > > Me too. Thank you for the improved version! > I will test the patch in the background, but for now I see no other > issues with it. Thanks for confirming! Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Avoid orphaned objects dependencies, take 3
Hi, On Wed, May 15, 2024 at 10:14:09AM +0900, Michael Paquier wrote: > On Thu, May 09, 2024 at 12:20:51PM +0000, Bertrand Drouvot wrote: > > Oh I see, your test updates an existing dependency. v4 took care about > > brand new > > dependencies creation (recordMultipleDependencies()) but forgot to take care > > about changing an existing dependency (which is done in another code path: > > changeDependencyFor()). > > > > Please find attached v5 that adds: > > > > - a call to the new depLockAndCheckObject() function in > > changeDependencyFor(). > > - a test when altering an existing dependency. > > > > With v5 applied, I don't see the issue anymore. > > +if (object_description) > +ereport(ERROR, errmsg("%s does not exist", > object_description)); > +else > +ereport(ERROR, errmsg("a dependent object does not ex > > This generates an internal error code. Is that intended? Thanks for looking at it! Yes, it's like when say you want to create an object in a schema that does not exist (see get_namespace_oid()). > --- /dev/null > +++ > b/src/test/modules/test_dependencies_locks/specs/test_dependencies_locks.spec > > This introduces a module with only one single spec. I could get > behind an extra module if splitting the tests into more specs makes > sense or if there is a restriction on installcheck. However, for > one spec file filed with a bunch of objects, and note that I'm OK to > let this single spec grow more for this range of tests, it seems to me > that this would be better under src/test/isolation/. Yeah, I was not sure about this one (the location is from take 2 mentioned up-thread). I'll look at moving the tests to src/test/isolation/. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Log details for stats dropped more than once
Hi, On Wed, May 15, 2024 at 02:47:29PM +0900, Michael Paquier wrote: > On Tue, May 14, 2024 at 10:07:14AM +0000, Bertrand Drouvot wrote: > > While resuming the work on refilenode stats (mentioned in [1] but did not > > share > > the patch yet), I realized that my current POC patch is buggy enough to > > produce > > things like: > > > > 024-05-14 09:51:14.783 UTC [1788714] FATAL: can only drop stats once > > > > While the CONTEXT provides the list of dropped stats: > > > > 2024-05-14 09:51:14.783 UTC [1788714] CONTEXT: WAL redo at 0/D75F478 for > > Transaction/ABORT: 2024-05-14 09:51:14.782223+00; dropped stats: > > 2/16384/27512/0 2/16384/27515/0 2/16384/27516/0 > > Can refcount be useful to know in this errcontext? Thanks for looking at it! Do you mean as part of the added errdetail_internal()? If so, yeah I think it's a good idea (done that way in v2 attached). > > Attached a tiny patch to report the stat that generates the error. The > > patch uses > > errdetail_internal() as the extra details don't seem to be useful to average > > users. > > I think that's fine. Overall that looks like useful information for > debugging, so no objections from here. Thanks! BTW, I just realized that adding more details for this error has already been mentioned in [1] (and Andres did propose a slightly different version). [1]: https://www.postgresql.org/message-id/20240505160915.6boysum4f34siqct%40awork3.anarazel.de Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com >From 0866e8252f2038f3fdbd9cfabb214461689210df Mon Sep 17 00:00:00 2001 From: Bertrand Drouvot Date: Tue, 14 May 2024 09:10:50 + Subject: [PATCH v2] Log details for stats dropped more than once Adding errdetail_internal() in pgstat_drop_entry_internal() to display the PgStat_HashKey and the entry's refcount when the "can only drop stats once" error is reported. --- src/backend/utils/activity/pgstat_shmem.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) 100.0% src/backend/utils/activity/ diff --git a/src/backend/utils/activity/pgstat_shmem.c b/src/backend/utils/activity/pgstat_shmem.c index 91591da395..0595c08d6e 100644 --- a/src/backend/utils/activity/pgstat_shmem.c +++ b/src/backend/utils/activity/pgstat_shmem.c @@ -785,7 +785,12 @@ pgstat_drop_entry_internal(PgStatShared_HashEntry *shent, * backends to release their references. */ if (shent->dropped) - elog(ERROR, "can only drop stats once"); + ereport(ERROR, +errmsg("can only drop stats once"), +errdetail_internal("Stats kind=%s dboid=%u objoid=%u refcount=%u.", + pgstat_get_kind_info(shent->key.kind)->name, + shent->key.dboid, shent->key.objoid, + pg_atomic_read_u32(>refcount))); shent->dropped = true; /* release refcount marking entry as not dropped */ -- 2.34.1
Log details for stats dropped more than once
Hi hackers, While resuming the work on refilenode stats (mentioned in [1] but did not share the patch yet), I realized that my current POC patch is buggy enough to produce things like: 024-05-14 09:51:14.783 UTC [1788714] FATAL: can only drop stats once While the CONTEXT provides the list of dropped stats: 2024-05-14 09:51:14.783 UTC [1788714] CONTEXT: WAL redo at 0/D75F478 for Transaction/ABORT: 2024-05-14 09:51:14.782223+00; dropped stats: 2/16384/27512/0 2/16384/27515/0 2/16384/27516/0 It's not clear which one generates the error (don't pay attention to the actual values, the issue comes from the new refilenode stats that I removed from the output). Attached a tiny patch to report the stat that generates the error. The patch uses errdetail_internal() as the extra details don't seem to be useful to average users. [1]: https://www.postgresql.org/message-id/ZbIdgTjR2QcFJ2mE%40ip-10-97-1-34.eu-west-3.compute.internal Looking forward to your feedback, Regards -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com >From f296d4092f5e87ed63a323db3f1bc9cfa807e2e8 Mon Sep 17 00:00:00 2001 From: Bertrand Drouvot Date: Tue, 14 May 2024 09:10:50 + Subject: [PATCH v1] Log details for stats dropped more than once Adding errdetail_internal() in pgstat_drop_entry_internal() to display the PgStat_HashKey when the "can only drop stats once" error is reported. --- src/backend/utils/activity/pgstat_shmem.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) 100.0% src/backend/utils/activity/ diff --git a/src/backend/utils/activity/pgstat_shmem.c b/src/backend/utils/activity/pgstat_shmem.c index 91591da395..1308c4439a 100644 --- a/src/backend/utils/activity/pgstat_shmem.c +++ b/src/backend/utils/activity/pgstat_shmem.c @@ -785,7 +785,11 @@ pgstat_drop_entry_internal(PgStatShared_HashEntry *shent, * backends to release their references. */ if (shent->dropped) - elog(ERROR, "can only drop stats once"); + ereport(ERROR, +errmsg("can only drop stats once"), +errdetail_internal("Stats kind=%s dboid=%u objoid=%u.", + pgstat_get_kind_info(shent->key.kind)->name, + shent->key.dboid, shent->key.objoid)); shent->dropped = true; /* release refcount marking entry as not dropped */ -- 2.34.1
Re: Avoid orphaned objects dependencies, take 3
Hi, On Tue, Apr 30, 2024 at 08:00:00PM +0300, Alexander Lakhin wrote: > Hi Bertrand, > > But I've discovered yet another possibility to get a broken dependency. Thanks for the testing and the report! > Please try this script: > res=0 > numclients=20 > for ((i=1;i<=100;i++)); do > for ((c=1;c<=numclients;c++)); do > echo " > CREATE SCHEMA s_$c; > CREATE CONVERSION myconv_$c FOR 'LATIN1' TO 'UTF8' FROM iso8859_1_to_utf8; > ALTER CONVERSION myconv_$c SET SCHEMA s_$c; > " | psql >psql1-$c.log 2>&1 & > echo "DROP SCHEMA s_$c RESTRICT;" | psql >psql2-$c.log 2>&1 & > done > wait > pg_dump -f db.dump || { echo "on iteration $i"; res=1; break; } > for ((c=1;c<=numclients;c++)); do > echo "DROP SCHEMA s_$c CASCADE;" | psql >psql3-$c.log 2>&1 > done > done > psql -c "SELECT * FROM pg_conversion WHERE connamespace NOT IN (SELECT oid > FROM pg_namespace);" > > It fails for me (with the v4 patch applied) as follows: > pg_dump: error: schema with OID 16392 does not exist > on iteration 1 > oid | conname | connamespace | conowner | conforencoding | contoencoding > | conproc | condefault > ---+--+--+--++---+---+ > 16396 | myconv_6 | 16392 | 10 | 8 | 6 > | iso8859_1_to_utf8 | f > Thanks for sharing the test, I'm able to reproduce the issue with v4. Oh I see, your test updates an existing dependency. v4 took care about brand new dependencies creation (recordMultipleDependencies()) but forgot to take care about changing an existing dependency (which is done in another code path: changeDependencyFor()). Please find attached v5 that adds: - a call to the new depLockAndCheckObject() function in changeDependencyFor(). - a test when altering an existing dependency. With v5 applied, I don't see the issue anymore. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com >From 6fc24a798210f730ab04833fa58074f142be968e Mon Sep 17 00:00:00 2001 From: Bertrand Drouvot Date: Fri, 29 Mar 2024 15:43:26 + Subject: [PATCH v5] Avoid orphaned objects dependencies It's currently possible to create orphaned objects dependencies, for example: Scenario 1: session 1: begin; drop schema schem; session 2: create a function in the schema schem session 1: commit; With the above, the function created in session 2 would be linked to a non existing schema. Scenario 2: session 1: begin; create a function in the schema schem session 2: drop schema schem; session 1: commit; With the above, the function created in session 1 would be linked to a non existing schema. To avoid those scenarios, a new lock (that conflicts with a lock taken by DROP) has been put in place when the dependencies are being recorded. With this in place, the drop schema in scenario 2 would be locked. Also, after the new lock attempt, the patch checks that the object still exists: with this in place session 2 in scenario 1 would be locked and would report an error once session 1 committs (that would not be the case should session 1 abort the transaction). If the object is dropped before the new lock attempt is triggered then the patch would also report an error (but with less details). The patch adds a few tests for some dependency cases (that would currently produce orphaned objects): - schema and function (as the above scenarios) - alter a dependency (function and schema) - function and arg type - function and return type - function and function - domain and domain - table and type - server and foreign data wrapper --- src/backend/catalog/dependency.c | 54 src/backend/catalog/objectaddress.c | 70 ++ src/backend/catalog/pg_depend.c | 12 ++ src/include/catalog/dependency.h | 1 + src/include/catalog/objectaddress.h | 1 + src/test/modules/Makefile | 1 + src/test/modules/meson.build | 1 + .../test_dependencies_locks/.gitignore| 3 + .../modules/test_dependencies_locks/Makefile | 14 ++ .../expected/test_dependencies_locks.out | 129 ++ .../test_dependencies_locks/meson.build | 12 ++ .../specs/test_dependencies_locks.spec| 89 12 files changed, 387 insertions(+) 22.9% src/backend/catalog/ 43.7% src/test/modules/test_dependencies_locks/expected/ 27.2% src/test/modules/test_dependencies_locks/specs/ 4.5% src/test/modules/test_dependencies_locks/ diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c index d4b5b2ade1..a49357bbe2 100644 --- a/src/backend/catalog/dependency.c +++ b/src/backend/catalog/dependency.c @@ -15
Re: First draft of PG 17 release notes
Hi, On Thu, May 09, 2024 at 12:03:50AM -0400, Bruce Momjian wrote: > I have committed the first draft of the PG 17 release notes; you can > see the results here: > > https://momjian.us/pgsql_docs/release-17.html Thanks for working on that! > I welcome feedback. > Add system view pg_wait_events that reports wait event types (Michael > Paquier) Michael is the committer for 1e68e43d3f, the author is me. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Synchronizing slots from primary to standby
Hi, On Mon, Apr 29, 2024 at 11:58:09AM +, Zhijie Hou (Fujitsu) wrote: > On Monday, April 29, 2024 5:11 PM shveta malik wrote: > > > > On Mon, Apr 29, 2024 at 11:38 AM shveta malik > > wrote: > > > > > > On Mon, Apr 29, 2024 at 10:57 AM Zhijie Hou (Fujitsu) > > > wrote: > > > > > > > > On Friday, March 15, 2024 10:45 PM Bertrand Drouvot > > wrote: > > > > > > > > > > Hi, > > > > > > > > > > On Thu, Mar 14, 2024 at 02:22:44AM +, Zhijie Hou (Fujitsu) wrote: > > > > > > Hi, > > > > > > > > > > > > Since the standby_slot_names patch has been committed, I am > > > > > > attaching the last doc patch for review. > > > > > > > > > > > > > > > > Thanks! > > > > > > > > > > 1 === > > > > > > > > > > + continue subscribing to publications now on the new primary > > > > > + server > > > > > without > > > > > + any data loss. > > > > > > > > > > I think "without any data loss" should be re-worded in this > > > > > context. Data loss in the sense "data committed on the primary and > > > > > not visible on the subscriber in case of failover" can still occurs > > > > > (in case > > synchronous replication is not used). > > > > > > > > > > 2 === > > > > > > > > > > + If the result (failover_ready) of both above > > > > > steps is > > > > > + true, existing subscriptions will be able to continue without data > > loss. > > > > > + > > > > > > > > > > I don't think that's true if synchronous replication is not used. > > > > > Say, > > > > > > > > > > - synchronous replication is not used > > > > > - primary is not able to reach the standby anymore and > > > > > standby_slot_names is set > > > > > - new data is inserted into the primary > > > > > - then not replicated to subscriber (due to standby_slot_names) > > > > > > > > > > Then I think the both above steps will return true but data would > > > > > be lost in case of failover. > > > > > > > > Thanks for the comments, attach the new version patch which reworded > > > > the above places. Thanks! > Here is the V3 doc patch. Thanks! A few comments: 1 === + losing any data that has been flushed to the new primary server. Worth to add a few words about possible data loss, something like? Please note that in case synchronous replication is not used and standby_slot_names is set correctly, it might be possible to lose data that would have been committed on the old primary server (in case the standby was not reachable during that time for example). 2 === +test_sub=# SELECT + array_agg(slotname) AS slots + FROM + (( + SELECT r.srsubid AS subid, CONCAT('pg_', srsubid, '_sync_', srrelid, '_', ctl.system_identifier) AS slotname + FROM pg_control_system() ctl, pg_subscription_rel r, pg_subscription s + WHERE r.srsubstate = 'f' AND s.oid = r.srsubid AND s.subfailover + ) UNION ( I guess this format comes from ReplicationSlotNameForTablesync(). What about creating a SQL callable function on top of it and make use of it in the query above? (that would ensure to keep the doc up to date even if the format changes in ReplicationSlotNameForTablesync()). 3 === +test_sub=# SELECT + MAX(remote_lsn) AS remote_lsn_on_subscriber + FROM + (( + SELECT (CASE WHEN r.srsubstate = 'f' THEN pg_replication_origin_progress(CONCAT('pg_', r.srsubid, '_', r.srrelid), false) + WHEN r.srsubstate IN ('s', 'r') THEN r.srsublsn END) AS remote_lsn + FROM pg_subscription_rel r, pg_subscription s + WHERE r.srsubstate IN ('f', 's', 'r') AND s.oid = r.srsubid AND s.subfailover + ) UNION ( + SELECT pg_replication_origin_progress(CONCAT('pg_', s.oid), false) AS remote_lsn + FROM pg_subscription s + WHERE s.subfailover + )); What about adding a join to pg_replication_origin to get rid of the "hardcoded" format "CONCAT('pg_', r.srsubid, '_', r.srrelid)" and "CONCAT('pg_', s.oid)"? Idea behind 2 === and 3 === is to have the queries as generic as possible and not rely on a hardcoded format (that would be more difficult to maintain should those formats change in the future). Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Avoid orphaned objects dependencies, take 3
Hi, On Thu, Apr 25, 2024 at 09:00:00AM +0300, Alexander Lakhin wrote: > Hi, > > 25.04.2024 08:00, Bertrand Drouvot wrote: > > > > > though concurrent create/drop operations can still produce > > > the "cache lookup failed" error, which is probably okay, except that it is > > > an INTERNAL_ERROR, which assumed to be not easily triggered by users. > > I did not see any of those "cache lookup failed" during my testing > > with/without > > your script. During which test(s) did you see those with v3 applied? > > You can try, for example, table-trigger, or other tests that check for > "cache lookup failed" psql output only (maybe you'll need to increase the > iteration count). For instance, I've got (with v4 applied): > 2024-04-25 05:48:08.102 UTC [3638763] ERROR: cache lookup failed for > function 20893 > 2024-04-25 05:48:08.102 UTC [3638763] STATEMENT: CREATE TRIGGER modified_c1 > BEFORE UPDATE OF c ON t > FOR EACH ROW WHEN (OLD.c <> NEW.c) EXECUTE PROCEDURE > trigger_func('modified_c'); > > Or with function-function: > 2024-04-25 05:52:31.390 UTC [3711897] ERROR: cache lookup failed for > function 32190 at character 54 > 2024-04-25 05:52:31.390 UTC [3711897] STATEMENT: CREATE FUNCTION f1() > RETURNS int LANGUAGE SQL RETURN f() + 1; > -- > 2024-04-25 05:52:37.639 UTC [3720011] ERROR: cache lookup failed for > function 34465 at character 54 > 2024-04-25 05:52:37.639 UTC [3720011] STATEMENT: CREATE FUNCTION f1() > RETURNS int LANGUAGE SQL RETURN f() + 1; I see, so this is during object creation. It's easy to reproduce this kind of issue with gdb. For example set a breakpoint on SearchSysCache1() and during the create function f1() once it breaks on: #0 SearchSysCache1 (cacheId=45, key1=16400) at syscache.c:221 #1 0x5ad305beacd6 in func_get_detail (funcname=0x5ad308204d50, fargs=0x0, fargnames=0x0, nargs=0, argtypes=0x72ff9cc0, expand_variadic=true, expand_defaults=true, include_out_arguments=false, funcid=0x72ff9ba0, rettype=0x72ff9b9c, retset=0x72ff9b94, nvargs=0x72ff9ba4, vatype=0x72ff9ba8, true_typeids=0x72ff9bd8, argdefaults=0x72ff9be0) at parse_func.c:1622 #2 0x5ad305be7dd0 in ParseFuncOrColumn (pstate=0x5ad30823be98, funcname=0x5ad308204d50, fargs=0x0, last_srf=0x0, fn=0x5ad308204da0, proc_call=false, location=55) at parse_func.c:266 #3 0x5ad305bdffb0 in transformFuncCall (pstate=0x5ad30823be98, fn=0x5ad308204da0) at parse_expr.c:1474 #4 0x5ad305bdd2ee in transformExprRecurse (pstate=0x5ad30823be98, expr=0x5ad308204da0) at parse_expr.c:230 #5 0x5ad305bdec34 in transformAExprOp (pstate=0x5ad30823be98, a=0x5ad308204e20) at parse_expr.c:990 #6 0x5ad305bdd1a0 in transformExprRecurse (pstate=0x5ad30823be98, expr=0x5ad308204e20) at parse_expr.c:187 #7 0x5ad305bdd00b in transformExpr (pstate=0x5ad30823be98, expr=0x5ad308204e20, exprKind=EXPR_KIND_SELECT_TARGET) at parse_expr.c:131 #8 0x5ad305b96b7e in transformReturnStmt (pstate=0x5ad30823be98, stmt=0x5ad308204ee0) at analyze.c:2395 #9 0x5ad305b92213 in transformStmt (pstate=0x5ad30823be98, parseTree=0x5ad308204ee0) at analyze.c:375 #10 0x5ad305c6321a in interpret_AS_clause (languageOid=14, languageName=0x5ad308204c40 "sql", funcname=0x5ad308204ad8 "f100", as=0x0, sql_body_in=0x5ad308204ee0, parameterTypes=0x0, inParameterNames=0x0, prosrc_str_p=0x72ffa208, probin_str_p=0x72ffa200, sql_body_out=0x72ffa210, queryString=0x5ad3082040b0 "CREATE FUNCTION f100() RETURNS int LANGUAGE SQL RETURN f() + 1;") at functioncmds.c:953 #11 0x5ad305c63c93 in CreateFunction (pstate=0x5ad308186310, stmt=0x5ad308204f00) at functioncmds.c:1221 then drop function f() in another session. Then the create function f1() would: postgres=# CREATE FUNCTION f() RETURNS int LANGUAGE SQL RETURN f() + 1; ERROR: cache lookup failed for function 16400 This stuff does appear before we get a chance to call the new depLockAndCheckObject() function. I think this is what Tom was referring to in [1]: " So the only real fix for this would be to make every object lookup in the entire system do the sort of dance that's done in RangeVarGetRelidExtended. " The fact that those kind of errors appear also somehow ensure that no orphaned dependencies can be created. The patch does ensure that no orphaned depencies can occur after those "initial" look up are done (should the dependent object be dropped). I'm tempted to not add extra complexity to avoid those kind of errors and keep the patch as it is. All of those servicing the same goal: no orphaned depencies are created. [1]: https://www.postgresql.org/message-id/2872252.1630851337%40sss.pgh.pa.us Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Avoid orphaned objects dependencies, take 3
Hi, On Wed, Apr 24, 2024 at 03:00:00PM +0300, Alexander Lakhin wrote: > 24.04.2024 11:38, Bertrand Drouvot wrote: > > I gave more thought to v2 and the approach seems reasonable to me. > > Basically what > > it does is that in case the object is already dropped before we take the > > new lock > > (while recording the dependency) then the error message is a "generic" one > > (means > > it does not provide the description of the "already" dropped object). I > > think it > > makes sense to write the patch that way by abandoning the patch's ambition > > to > > tell the description of the dropped object in all the cases. > > > > Of course, I would be happy to hear others thought about it. > > > > Please find v3 attached (which is v2 with more comments). > > Thank you for the improved version! > > I can confirm that it fixes that case. Great, thanks for the test! > I've also tested other cases that fail on master (most of them also fail > with v1), please try/look at the attached script. Thanks for all those tests! > (There could be other broken-dependency cases, of course, but I think I've > covered all the representative ones.) Agree. Furthermore the way the patch is written should be agnostic to the object's kind that are part of the dependency. Having said that, that does not hurt to add more tests in this patch, so v4 attached adds some of your tests ( that would fail on the master branch without the patch applied). The way the tests are written in the patch are less "racy" that when triggered with your script. Indeed, I think that in the patch the dependent object can not be removed before the new lock is taken when recording the dependency. We may want to add injection points in the game if we feel the need. > All tested cases work correctly with v3 applied — Yeah, same on my side, I did run them too and did not find any issues. > I couldn't get broken > dependencies, Same here. > though concurrent create/drop operations can still produce > the "cache lookup failed" error, which is probably okay, except that it is > an INTERNAL_ERROR, which assumed to be not easily triggered by users. I did not see any of those "cache lookup failed" during my testing with/without your script. During which test(s) did you see those with v3 applied? Attached v4, simply adding more tests to v3. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com >From a9b34955fab0351b7b5a7ba6eb36f199f5a5822c Mon Sep 17 00:00:00 2001 From: Bertrand Drouvot Date: Fri, 29 Mar 2024 15:43:26 + Subject: [PATCH v4] Avoid orphaned objects dependencies It's currently possible to create orphaned objects dependencies, for example: Scenario 1: session 1: begin; drop schema schem; session 2: create a function in the schema schem session 1: commit; With the above, the function created in session 2 would be linked to a non existing schema. Scenario 2: session 1: begin; create a function in the schema schem session 2: drop schema schem; session 1: commit; With the above, the function created in session 1 would be linked to a non existing schema. To avoid those scenarios, a new lock (that conflicts with a lock taken by DROP) has been put in place when the dependencies are being recorded. With this in place, the drop schema in scenario 2 would be locked. Also, after the new lock attempt, the patch checks that the object still exists: with this in place session 2 in scenario 1 would be locked and would report an error once session 1 committs (that would not be the case should session 1 abort the transaction). If the object is dropped before the new lock attempt is triggered then the patch would also report an error (but with less details). The patch adds a few tests for some dependency cases (that would currently produce orphaned objects): - schema and function (as the above scenarios) - function and arg type - function and return type - function and function - domain and domain - table and type - server and foreign data wrapper --- src/backend/catalog/dependency.c | 54 + src/backend/catalog/objectaddress.c | 70 +++ src/backend/catalog/pg_depend.c | 6 + src/include/catalog/dependency.h | 1 + src/include/catalog/objectaddress.h | 1 + src/test/modules/Makefile | 1 + src/test/modules/meson.build | 1 + .../test_dependencies_locks/.gitignore| 3 + .../modules/test_dependencies_locks/Makefile | 14 +++ .../expected/test_dependencies_locks.out | 113 ++ .../test_dependencies_locks/meson.build | 12 ++ .../specs/test_dependencies_locks.spec| 78 12 files changed
Re: Avoid orphaned objects dependencies, take 3
Hi, On Tue, Apr 23, 2024 at 04:20:46PM +, Bertrand Drouvot wrote: > Please find attached v2 that should not produce the issue anymore (I launched > a > lot of attempts without any issues). v1 was not strong enough as it was not > always checking for the dependent object existence. v2 now always checks if > the > object still exists after the additional lock acquisition attempt while > recording > the dependency. > > I still need to think about v2 but in the meantime could you please also give > v2 a try on you side? I gave more thought to v2 and the approach seems reasonable to me. Basically what it does is that in case the object is already dropped before we take the new lock (while recording the dependency) then the error message is a "generic" one (means it does not provide the description of the "already" dropped object). I think it makes sense to write the patch that way by abandoning the patch's ambition to tell the description of the dropped object in all the cases. Of course, I would be happy to hear others thought about it. Please find v3 attached (which is v2 with more comments). Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com >From 32945912ceddad6b171fd8b345adefa4432af357 Mon Sep 17 00:00:00 2001 From: Bertrand Drouvot Date: Fri, 29 Mar 2024 15:43:26 + Subject: [PATCH v3] Avoid orphaned objects dependencies It's currently possible to create orphaned objects dependencies, for example: Scenario 1: session 1: begin; drop schema schem; session 2: create a function in the schema schem session 1: commit; With the above, the function created in session 2 would be linked to a non existing schema. Scenario 2: session 1: begin; create a function in the schema schem session 2: drop schema schem; session 1: commit; With the above, the function created in session 1 would be linked to a non existing schema. To avoid those scenarios, a new lock (that conflicts with a lock taken by DROP) has been put in place when the dependencies are being recorded. With this in place, the drop schema in scenario 2 would be locked. Also, after locking the object, the patch checks that the object still exists: with this in place session 2 in scenario 1 would be locked and would report an error once session 1 committs (that would not be the case should session 1 abort the transaction). The patch adds a few tests for some dependency cases (that would currently produce orphaned objects): - schema and function (as the above scenarios) - function and type - table and type --- src/backend/catalog/dependency.c | 54 ++ src/backend/catalog/objectaddress.c | 70 +++ src/backend/catalog/pg_depend.c | 6 ++ src/include/catalog/dependency.h | 1 + src/include/catalog/objectaddress.h | 1 + src/test/modules/Makefile | 1 + src/test/modules/meson.build | 1 + .../test_dependencies_locks/.gitignore| 3 + .../modules/test_dependencies_locks/Makefile | 14 .../expected/test_dependencies_locks.out | 49 + .../test_dependencies_locks/meson.build | 12 .../specs/test_dependencies_locks.spec| 39 +++ 12 files changed, 251 insertions(+) 40.5% src/backend/catalog/ 29.6% src/test/modules/test_dependencies_locks/expected/ 18.7% src/test/modules/test_dependencies_locks/specs/ 8.3% src/test/modules/test_dependencies_locks/ diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c index d4b5b2ade1..a49357bbe2 100644 --- a/src/backend/catalog/dependency.c +++ b/src/backend/catalog/dependency.c @@ -1517,6 +1517,60 @@ AcquireDeletionLock(const ObjectAddress *object, int flags) } } +/* + * depLockAndCheckObject + * + * Lock the object that we are about to record a dependency on. + * After it's locked, verify that it hasn't been dropped while we + * weren't looking. If the object has been dropped, this function + * does not return! + */ +void +depLockAndCheckObject(const ObjectAddress *object) +{ + char *object_description; + + /* + * Those don't rely on LockDatabaseObject() when being dropped (see + * AcquireDeletionLock()). Also it looks like they can not produce + * orphaned dependent objects when being dropped. + */ + if (object->classId == RelationRelationId || object->classId == AuthMemRelationId) + return; + + object_description = getObjectDescription(object, true); + + /* assume we should lock the whole object not a sub-object */ + LockDatabaseObject(object->classId, object->objectId, 0, AccessShareLock); + + /* check if object still exists */ + if (!ObjectByIdExist(object, false)) + { + /* + * It might be possible that we are creating it (for example creating + * a composite type while creating a relation), so bypass the syscache
Re: Avoid orphaned objects dependencies, take 3
Hi, On Tue, Apr 23, 2024 at 04:59:09AM +, Bertrand Drouvot wrote: > Hi, > > On Mon, Apr 22, 2024 at 03:00:00PM +0300, Alexander Lakhin wrote: > > 22.04.2024 13:52, Bertrand Drouvot wrote: > > > > > > That's weird, I just launched it several times with the patch applied and > > > I'm not > > > able to see the seg fault (while I can see it constently failing on the > > > master > > > branch). > > > > > > Are you 100% sure you tested it against a binary with the patch applied? > > > > > > > Yes, at least I can't see what I'm doing wrong. Please try my > > self-contained script attached. > > Thanks for sharing your script! > > Yeah your script ensures the patch is applied before the repro is executed. > > I do confirm that I can also see the issue with the patch applied (but I had > to > launch multiple attempts, while on master one attempt is enough). > > I'll have a look. Please find attached v2 that should not produce the issue anymore (I launched a lot of attempts without any issues). v1 was not strong enough as it was not always checking for the dependent object existence. v2 now always checks if the object still exists after the additional lock acquisition attempt while recording the dependency. I still need to think about v2 but in the meantime could you please also give v2 a try on you side? Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com >From f80fdfc32e791463555aa318f26ff5e7363ac3ac Mon Sep 17 00:00:00 2001 From: Bertrand Drouvot Date: Fri, 29 Mar 2024 15:43:26 + Subject: [PATCH v2] Avoid orphaned objects dependencies It's currently possible to create orphaned objects dependencies, for example: Scenario 1: session 1: begin; drop schema schem; session 2: create a function in the schema schem session 1: commit; With the above, the function created in session 2 would be linked to a non existing schema. Scenario 2: session 1: begin; create a function in the schema schem session 2: drop schema schem; session 1: commit; With the above, the function created in session 1 would be linked to a non existing schema. To avoid those scenarios, a new lock (that conflicts with a lock taken by DROP) has been put in place when the dependencies are being recorded. With this in place, the drop schema in scenario 2 would be locked. Also, after locking the object, the patch checks that the object still exists: with this in place session 2 in scenario 1 would be locked and would report an error once session 1 committs (that would not be the case should session 1 abort the transaction). The patch adds a few tests for some dependency cases (that would currently produce orphaned objects): - schema and function (as the above scenarios) - function and type - table and type --- src/backend/catalog/dependency.c | 48 + src/backend/catalog/objectaddress.c | 70 +++ src/backend/catalog/pg_depend.c | 6 ++ src/include/catalog/dependency.h | 1 + src/include/catalog/objectaddress.h | 1 + src/test/modules/Makefile | 1 + src/test/modules/meson.build | 1 + .../test_dependencies_locks/.gitignore| 3 + .../modules/test_dependencies_locks/Makefile | 14 .../expected/test_dependencies_locks.out | 49 + .../test_dependencies_locks/meson.build | 12 .../specs/test_dependencies_locks.spec| 39 +++ 12 files changed, 245 insertions(+) 38.1% src/backend/catalog/ 30.7% src/test/modules/test_dependencies_locks/expected/ 19.5% src/test/modules/test_dependencies_locks/specs/ 8.7% src/test/modules/test_dependencies_locks/ diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c index d4b5b2ade1..e3b66025dd 100644 --- a/src/backend/catalog/dependency.c +++ b/src/backend/catalog/dependency.c @@ -1517,6 +1517,54 @@ AcquireDeletionLock(const ObjectAddress *object, int flags) } } +/* + * depLockAndCheckObject + * + * Lock the object that we are about to record a dependency on. + * After it's locked, verify that it hasn't been dropped while we + * weren't looking. If the object has been dropped, this function + * does not return! + */ +void +depLockAndCheckObject(const ObjectAddress *object) +{ + char *object_description; + + /* + * Those don't rely on LockDatabaseObject() when being dropped (see + * AcquireDeletionLock()). Also it looks like they can not produce + * orphaned dependent objects when being dropped. + */ + if (object->classId == RelationRelationId || object->classId == AuthMemRelationId) + return; + + object_description = getObjectDescription(object, true); + + /* assume we should lock the whole object not a sub-object */
Re: Avoid orphaned objects dependencies, take 3
Hi, On Mon, Apr 22, 2024 at 03:00:00PM +0300, Alexander Lakhin wrote: > 22.04.2024 13:52, Bertrand Drouvot wrote: > > > > That's weird, I just launched it several times with the patch applied and > > I'm not > > able to see the seg fault (while I can see it constently failing on the > > master > > branch). > > > > Are you 100% sure you tested it against a binary with the patch applied? > > > > Yes, at least I can't see what I'm doing wrong. Please try my > self-contained script attached. Thanks for sharing your script! Yeah your script ensures the patch is applied before the repro is executed. I do confirm that I can also see the issue with the patch applied (but I had to launch multiple attempts, while on master one attempt is enough). I'll have a look. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Avoid orphaned objects dependencies, take 3
Hi, On Mon, Apr 22, 2024 at 01:00:00PM +0300, Alexander Lakhin wrote: > Hi Bertrand, > > 22.04.2024 11:45, Bertrand Drouvot wrote: > > Hi, > > > > This new thread is a follow-up of [1] and [2]. > > > > Problem description: > > > > We have occasionally observed objects having an orphaned dependency, the > > most common case we have seen is functions not linked to any namespaces. > > > > ... > > > > Looking forward to your feedback, > > This have reminded me of bug #17182 [1]. Thanks for the link to the bug! > Unfortunately, with the patch applied, the following script: > > for ((i=1;i<=100;i++)); do > ( { for ((n=1;n<=20;n++)); do echo "DROP SCHEMA s;"; done } | psql ) > >psql1.log 2>&1 & > echo " > CREATE SCHEMA s; > CREATE FUNCTION s.func1() RETURNS int LANGUAGE SQL AS 'SELECT 1;'; > CREATE FUNCTION s.func2() RETURNS int LANGUAGE SQL AS 'SELECT 2;'; > CREATE FUNCTION s.func3() RETURNS int LANGUAGE SQL AS 'SELECT 3;'; > CREATE FUNCTION s.func4() RETURNS int LANGUAGE SQL AS 'SELECT 4;'; > CREATE FUNCTION s.func5() RETURNS int LANGUAGE SQL AS 'SELECT 5;'; > " | psql >psql2.log 2>&1 & > wait > psql -c "DROP SCHEMA s CASCADE" >psql3.log > done > echo " > SELECT pg_identify_object('pg_proc'::regclass, pp.oid, 0), pp.oid FROM > pg_proc pp > LEFT JOIN pg_namespace pn ON pp.pronamespace = pn.oid WHERE pn.oid IS NULL" > | psql > > still ends with: > server closed the connection unexpectedly > This probably means the server terminated abnormally > before or while processing the request. > > 2024-04-22 09:54:39.171 UTC|||662633dc.152bbc|LOG: server process (PID > 1388378) was terminated by signal 11: Segmentation fault > 2024-04-22 09:54:39.171 UTC|||662633dc.152bbc|DETAIL: Failed process was > running: SELECT pg_identify_object('pg_proc'::regclass, pp.oid, 0), pp.oid > FROM pg_proc pp > LEFT JOIN pg_namespace pn ON pp.pronamespace = pn.oid WHERE pn.oid IS > NULL > Thanks for sharing the script. That's weird, I just launched it several times with the patch applied and I'm not able to see the seg fault (while I can see it constently failing on the master branch). Are you 100% sure you tested it against a binary with the patch applied? Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Disallow changing slot's failover option in transaction block
Hi, On Mon, Apr 22, 2024 at 11:32:14AM +0530, shveta malik wrote: > On Mon, Apr 22, 2024 at 5:57 AM Zhijie Hou (Fujitsu) > wrote: > > Attach the V3 patch which also addressed Shveta[1] and Bertrand[2]'s > > comments. Thanks! > Tested the patch, works well. Same here. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Patch to avoid orphaned dependencies
Hi, On Wed, Mar 23, 2022 at 12:49:06PM -0400, Tom Lane wrote: > Realistically, if we want to prevent this type of problem, then all > creation DDL will have to take a lock on each referenced object that'd > conflict with a lock taken by DROP. This might not be out of reach: > I think we do already take such locks while dropping objects. The > reference-side lock could be taken by the recordDependency mechanism > itself, ensuring that we don't miss anything; and that would also > allow us to not bother taking such a lock on pinned objects, which'd > greatly cut the cost (though not to zero). Thanks for the idea (and sorry for the delay replying to it)! I had a look at it and just created a new thread [1] based on your proposal. [1]: https://www.postgresql.org/message-id/flat/ZiYjn0eVc7pxVY45%40ip-10-97-1-34.eu-west-3.compute.internal Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Avoid orphaned objects dependencies, take 3
Hi, This new thread is a follow-up of [1] and [2]. Problem description: We have occasionally observed objects having an orphaned dependency, the most common case we have seen is functions not linked to any namespaces. Examples to produce such orphaned dependencies: Scenario 1: session 1: begin; drop schema schem; session 2: create a function in the schema schem session 1: commit; With the above, the function created in session 2 would be linked to a non existing schema. Scenario 2: session 1: begin; create a function in the schema schem session 2: drop schema schem; session 1: commit; With the above, the function created in session 1 would be linked to a non existing schema. A patch has been initially proposed to fix this particular (function-to-namespace) dependency (see [1]), but there could be much more scenarios (like the function-to-datatype one highlighted by Gilles in [1] that could lead to a function having an invalid parameter datatype). As Tom said there are dozens more cases that would need to be considered, and a global approach to avoid those race conditions should be considered instead. A first global approach attempt has been proposed in [2] making use of a dirty snapshot when recording the dependency. But this approach appeared to be "scary" and it was still failing to close some race conditions (see [2] for details). Then, Tom proposed another approach in [2] which is that "creation DDL will have to take a lock on each referenced object that'd conflict with a lock taken by DROP". This is what the attached patch is trying to achieve. It does the following: 1) A new lock (that conflicts with a lock taken by DROP) has been put in place when the dependencies are being recorded. Thanks to it, the drop schema in scenario 2 would be locked (resulting in an error should session 1 committs). 2) After locking the object while recording the dependency, the patch checks that the object still exists. Thanks to it, session 2 in scenario 1 would be locked and would report an error once session 1 committs (that would not be the case should session 1 abort the transaction). The patch also adds a few tests for some dependency cases (that would currently produce orphaned objects): - schema and function (as the above scenarios) - function and type - table and type (which is I think problematic enough, as involving a table into the game, to fix this stuff as a whole). [1]: https://www.postgresql.org/message-id/flat/a4f55089-7cbd-fe5d-a9bb-19adc6418...@darold.net#9af5cdaa9e80879beb1def3604c976e8 [2]: https://www.postgresql.org/message-id/flat/8369ff70-0e31-f194-2954-787f4d9e21dd%40amazon.com Please note that I'm not used to with this area of the code so that the patch might not take the option proposed by Tom the "right" way. Adding the patch to the July CF. Looking forward to your feedback, Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com >From 6798e85b0679dfccfa665007b06d9f1a0e39e0c6 Mon Sep 17 00:00:00 2001 From: Bertrand Drouvot Date: Fri, 29 Mar 2024 15:43:26 + Subject: [PATCH v1] Avoid orphaned objects dependencies It's currently possible to create orphaned objects dependencies, for example: Scenario 1: session 1: begin; drop schema schem; session 2: create a function in the schema schem session 1: commit; With the above, the function created in session 2 would be linked to a non existing schema. Scenario 2: session 1: begin; create a function in the schema schem session 2: drop schema schem; session 1: commit; With the above, the function created in session 1 would be linked to a non existing schema. To avoid those scenarios, a new lock (that conflicts with a lock taken by DROP) has been put in place when the dependencies are being recorded. With this in place, the drop schema in scenario 2 would be locked. Also, after locking the object, the patch checks that the object still exists: with this in place session 2 in scenario 1 would be locked and would report an error once session 1 committs (that would not be the case should session 1 abort the transaction). The patch adds a few tests for some dependency cases (that would currently produce orphaned objects): - schema and function (as the above scenarios) - function and type - table and type --- src/backend/catalog/dependency.c | 42 ++ src/backend/catalog/objectaddress.c | 57 +++ src/backend/catalog/pg_depend.c | 6 ++ src/include/catalog/dependency.h | 1 + src/include/catalog/objectaddress.h | 1 + src/test/modules/Makefile | 1 + src/test/modules/meson.build | 1 + .../test_dependencies_locks/.gitignore| 3 + .../modules/test_dependencies_locks/Makefile | 14 + .../expected/test_dependencies_locks.out | 49 .../te
Re: Disallow changing slot's failover option in transaction block
Hi, On Fri, Apr 19, 2024 at 12:39:40AM +, Zhijie Hou (Fujitsu) wrote: > Here is V2 patch which includes the changes for CREATE SUBSCRIPTION as > suggested. Since we don't connect pub to alter slot when (create_slot=false) > anymore, the restriction that disallows failover=true when connect=false is > also removed. Thanks! + specified in the subscription. When creating the slot, ensure the slot + property failover matches the counterpart parameter + of the subscription. The slot could be created before or after the subscription is created, so I think it needs a bit of rewording (as here it sounds like the sub is already created), , something like? "Always ensure the slot property failover matches the counterpart parameter of the subscription and vice versa." Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: promotion related handling in pg_sync_replication_slots()
Hi, On Thu, Apr 18, 2024 at 05:36:05PM +0530, shveta malik wrote: > Please find v8 attached. Changes are: Thanks! A few comments: 1 === @@ -1440,7 +1461,7 @@ ReplSlotSyncWorkerMain(char *startup_data, size_t startup_data_len) * slotsync_worker_onexit() but that will need the connection to be made * global and we want to avoid introducing global for this purpose. */ - before_shmem_exit(slotsync_failure_callback, PointerGetDatum(wrconn)); + before_shmem_exit(slotsync_worker_disconnect, PointerGetDatum(wrconn)); The comment above this change still states "Register the failure callback once we have the connection", I think it has to be reworded a bit now that v8 is making use of slotsync_worker_disconnect(). 2 === +* Register slotsync_worker_onexit() before we register +* ReplicationSlotShmemExit() in BaseInit(), to ensure that during exit of +* slot sync worker, ReplicationSlotShmemExit() is called first, followed +* by slotsync_worker_onexit(). Startup process during promotion waits for Worth to mention in shmem_exit() (where it "while (--before_shmem_exit_index >= 0)" or before the shmem_exit() definition) that ReplSlotSyncWorkerMain() relies on this LIFO behavior? (not sure if there is other "strong" LIFO requirement in other part of the code). 3 === +* Startup process during promotion waits for slot sync to finish and it +* does that by checking the 'syncing' flag. worth to mention ShutDownSlotSync()? 4 === I did a few tests manually (launching ShutDownSlotSync() through gdb / with and without sync worker and with / without pg_sync_replication_slots() running concurrently) and it looks like it works as designed. Having said that, the logic that is in place to take care of the corner cases described up-thread seems reasonable to me. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: promotion related handling in pg_sync_replication_slots()
Hi, On Tue, Apr 16, 2024 at 02:06:45PM +0530, shveta malik wrote: > On Tue, Apr 16, 2024 at 1:55 PM Zhijie Hou (Fujitsu) > wrote: > > I personally feel adding the additional check for sync_replication_slots may > > not improve the situation here. Because the GUC sync_replication_slots can > > change at any point, the GUC could be false when performing this addition > > check > > and is set to true immediately after the check, so It could not simplify > > the logic > > anyway. > > +1. > I feel doc and "cannot synchronize replication slots concurrently" > check should suffice. > > In the scenario which Hou-San pointed out, if after performing the > GUC check in SQL function, this GUC is enabled immediately and say > worker is started sooner than the function could get chance to sync, > in that case as well, SQL function will ultimately get error "cannot > synchronize replication slots concurrently", even though GUC is > enabled. Thus, I feel we should stick with samer error in all > scenarios. Okay, fine by me, let's forget about checking sync_replication_slots then. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Disallow changing slot's failover option in transaction block
Hi, On Tue, Apr 16, 2024 at 06:32:11AM +, Zhijie Hou (Fujitsu) wrote: > Hi, > > Kuroda-San reported an issue off-list that: > > If user execute ALTER SUBSCRIPTION SET (failover) command inside a txn block > and rollback, only the subscription option change can be rolled back, while > the > replication slot's failover change is preserved. Nice catch, thanks! > To fix it, I think we can prevent user from executing ALTER SUBSCRIPTION set > (failover) inside a txn block, which is also consistent to the ALTER > SUBSCRIPTION REFRESH/DROP SUBSCRIPTION command. Attach a small > patch to address this. Agree. The patch looks pretty straightforward to me. Worth to add this case in the doc? (where we already mention that "Commands ALTER SUBSCRIPTION ... REFRESH PUBLICATION and ALTER SUBSCRIPTION ... {SET|ADD|DROP} PUBLICATION ... with refresh option as true cannot be executed inside a transaction block" Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: promotion related handling in pg_sync_replication_slots()
Hi, On Tue, Apr 16, 2024 at 10:00:04AM +0530, shveta malik wrote: > Please find v5 addressing above comments. Thanks! @@ -1634,9 +1677,14 @@ SyncReplicationSlots(WalReceiverConn *wrconn) { PG_ENSURE_ERROR_CLEANUP(slotsync_failure_callback, PointerGetDatum(wrconn)); { + check_flags_and_set_sync_info(InvalidPid); + Given the fact that if the sync worker is running it won't be possible to trigger a manual sync with pg_sync_replication_slots(), what about also checking the "sync_replication_slots" value at the start of SyncReplicationSlots() and emmit an error if sync_replication_slots is set to on? (The message could explicitly states that it's not possible to use the function if sync_replication_slots is set to on). Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: promotion related handling in pg_sync_replication_slots()
Hi, On Tue, Apr 16, 2024 at 08:21:04AM +0530, Amit Kapila wrote: > On Mon, Apr 15, 2024 at 7:47 PM Bertrand Drouvot > wrote: > > > > On Mon, Apr 15, 2024 at 06:29:49PM +0530, Amit Kapila wrote: > > > On Mon, Apr 15, 2024 at 6:21 PM Bertrand Drouvot > > > wrote: > > > > > > > > On Mon, Apr 15, 2024 at 03:38:53PM +0530, shveta malik wrote: > > > > > Now both worker and SQL > > > > > function set 'syncing' when they start and reset it when they exit. > > > > > > > > It means that it's not possible anymore to trigger a manual sync if > > > > sync_replication_slots is on. Indeed that would trigger: > > > > > > > > postgres=# select pg_sync_replication_slots(); > > > > ERROR: cannot synchronize replication slots concurrently > > > > > > > > That looks like an issue to me, thoughts? > > > > > > > > > > This is intentional as of now for the sake of keeping > > > implementation/code simple. It is not difficult to allow them but I am > > > not sure whether we want to add another set of conditions allowing > > > them in parallel. > > > > I think that the ability to launch a manual sync before a switchover would > > be > > missed. Except for this case I don't think that's an issue to prevent them > > to > > run in parallel. > > > > I think if the slotsync worker is available, it can do that as well. Right, but one has no control as to when the sync is triggered. > There is no clear use case for allowing them in parallel and I feel it > would add more confusion when it can work sometimes but not other > times. However, if we receive some report from the field where there > is a real demand for such a thing, it should be easy to achieve. For > example, I can imagine that we can have sync_state that has values > 'started', 'in_progress' , and 'finished'. This should allow us to > achieve what the current proposed patch is doing along with allowing > the API to work in parallel when the sync_state is not 'in_progress'. > > I think for now let's restrict their usage in parallel and make the > promotion behavior consistent both for worker and API. Okay, let's do it that way. Is it worth to add a few words in the doc related to pg_sync_replication_slots() though? (to mention it can not be used if the sync slot worker is running). Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: promotion related handling in pg_sync_replication_slots()
Hi, On Mon, Apr 15, 2024 at 06:29:49PM +0530, Amit Kapila wrote: > On Mon, Apr 15, 2024 at 6:21 PM Bertrand Drouvot > wrote: > > > > On Mon, Apr 15, 2024 at 03:38:53PM +0530, shveta malik wrote: > > > Now both worker and SQL > > > function set 'syncing' when they start and reset it when they exit. > > > > It means that it's not possible anymore to trigger a manual sync if > > sync_replication_slots is on. Indeed that would trigger: > > > > postgres=# select pg_sync_replication_slots(); > > ERROR: cannot synchronize replication slots concurrently > > > > That looks like an issue to me, thoughts? > > > > This is intentional as of now for the sake of keeping > implementation/code simple. It is not difficult to allow them but I am > not sure whether we want to add another set of conditions allowing > them in parallel. I think that the ability to launch a manual sync before a switchover would be missed. Except for this case I don't think that's an issue to prevent them to run in parallel. > And that too in an unpredictable way as the API will > work only for the time slot sync worker is not performing the sync. Yeah but then at least you would know that there is "really" a sync in progress (which is not the case currently with v4, as the sync worker being started is enough to prevent a manual sync even if a sync is not in progress). Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: promotion related handling in pg_sync_replication_slots()
Hi, On Mon, Apr 15, 2024 at 03:38:53PM +0530, shveta malik wrote: > On Mon, Apr 15, 2024 at 2:29 PM Amit Kapila wrote: > > > > On Fri, Apr 12, 2024 at 5:25 PM shveta malik wrote: > > > > > > Please find v3 addressing race-condition and one other comment. > > > > > > Up-thread it was suggested that, probably, checking > > > SlotSyncCtx->syncing should be sufficient in ShutDownSlotSync(). On > > > re-thinking, it might not be. Slot sync worker sets and resets > > > 'syncing' with each sync-cycle, and thus we need to rely on worker's > > > pid in ShutDownSlotSync(), as there could be a window where promotion > > > is triggered and 'syncing' is not set for worker, while the worker is > > > still running. This implementation of setting and resetting syncing > > > with each sync-cycle looks better as compared to setting syncing > > > during the entire life-cycle of the worker. So, I did not change it. > > > > > > > To retain this we need to have different handling for 'syncing' for > > workers and function which seems like more maintenance burden than the > > value it provides. Moreover, in SyncReplicationSlots(), we are calling > > a function after acquiring spinlock which is not our usual coding > > practice. > > Okay. Changed it to consistent handling. Thanks for the patch! > Now both worker and SQL > function set 'syncing' when they start and reset it when they exit. It means that it's not possible anymore to trigger a manual sync if sync_replication_slots is on. Indeed that would trigger: postgres=# select pg_sync_replication_slots(); ERROR: cannot synchronize replication slots concurrently That looks like an issue to me, thoughts? Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Synchronizing slots from primary to standby
Hi, On Sat, Apr 06, 2024 at 10:13:00AM +0530, Amit Kapila wrote: > On Fri, Apr 5, 2024 at 8:05 PM Bertrand Drouvot > wrote: > > I think the new LSN can be visible only when the corresponding WAL is > written by XLogWrite(). I don't know what in XLogSetAsyncXactLSN() can > make it visible. In your experiment below, isn't it possible that in > the meantime WAL writer has written that WAL due to which you are > seeing an updated location? What I did is: session 1: select pg_current_wal_lsn();\watch 1 session 2: select pg_backend_pid(); terminal 1: tail -f logfile | grep -i snap terminal 2 : gdb -p ) at standby.c:1346 1346{ (gdb) n 1350 Then next, next until the DEBUG message is emitted (confirmed in terminal 1). At this stage the DEBUG message shows the new LSN while session 1 still displays the previous LSN. Then once XLogSetAsyncXactLSN() is done in the gdb session (terminal 2) then session 1 displays the new LSN. This is reproducible as desired. With more debugging I can see that when the spinlock is released in XLogSetAsyncXactLSN() then XLogWrite() is doing its job and then session 1 does see the new value (that happens in this order, and as you said that's expected). My point is that while the DEBUG message is emitted session 1 still see the old LSN (until the new LSN is vsible). I think that we should emit the DEBUG message once session 1 can see the new value (If not, I think the timestamp of the DEBUG message can be missleading during debugging purpose). > I think I am missing how exactly moving DEBUG2 can confirm the above theory. I meant to say that instead of seeing: 2024-04-05 04:37:05.074 UTC [3854278][background writer][:0] DEBUG: snapshot of 0+0 running transaction ids (lsn 0/398 oldest xid 740 latest complete 739 next xid 740) 2024-04-05 04:37:05.197 UTC [3866475][client backend][2/4:0] LOG: statement: SELECT '0/360' <= replay_lsn AND state = 'streaming' We would probably see something like: 2024-04-05 04:37:05. UTC [3866475][client backend][2/4:0] LOG: statement: SELECT '0/360' <= replay_lsn AND state = 'streaming' 2024-04-05 04:37:05.+xx UTC [3854278][background writer][:0] DEBUG: snapshot of 0+0 running transaction ids (lsn 0/398 oldest xid 740 latest complete 739 next xid 740) And then it would be clear that the query has ran before the new LSN is visible. > > If the theory is proven then I'm not sure we need the extra complexity of > > injection point here, maybe just relying on the slots created via SQL API > > could > > be enough. > > > > Yeah, that could be the first step. We can probably add an injection > point to control the bgwrite behavior and then add tests involving > walsender performing the decoding. But I think it is important to have > sufficient tests in this area as I see they are quite helpful in > uncovering the issues. > Yeah agree. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Synchronizing slots from primary to standby
Hi, On Fri, Apr 05, 2024 at 02:35:42PM +, Bertrand Drouvot wrote: > I think that maybe as a first step we should move the "elog(DEBUG2," message > as > proposed above to help debugging (that could help to confirm the above > theory). If you agree and think that makes sense, pleae find attached a tiny patch doing so. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com >From a414e81a4c5c88963b2412d1641f3de1262c15c0 Mon Sep 17 00:00:00 2001 From: Bertrand Drouvot Date: Fri, 5 Apr 2024 14:49:51 + Subject: [PATCH v1] Move DEBUG message in LogCurrentRunningXacts() Indeed the new LSN is visible to others session (say through pg_current_wal_lsn()) only after the spinlock (XLogCtl->info_lck) in XLogSetAsyncXactLSN() is released. So moving the DEBUG message after the XLogSetAsyncXactLSN() call seems more appropriate for debugging purpose. --- src/backend/storage/ipc/standby.c | 22 +++--- 1 file changed, 11 insertions(+), 11 deletions(-) 100.0% src/backend/storage/ipc/ diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c index 87b04e51b3..ba549acf50 100644 --- a/src/backend/storage/ipc/standby.c +++ b/src/backend/storage/ipc/standby.c @@ -1366,6 +1366,17 @@ LogCurrentRunningXacts(RunningTransactions CurrRunningXacts) recptr = XLogInsert(RM_STANDBY_ID, XLOG_RUNNING_XACTS); + /* + * Ensure running_xacts information is synced to disk not too far in the + * future. We don't want to stall anything though (i.e. use XLogFlush()), + * so we let the wal writer do it during normal operation. + * XLogSetAsyncXactLSN() conveniently will mark the LSN as to-be-synced + * and nudge the WALWriter into action if sleeping. Check + * XLogBackgroundFlush() for details why a record might not be flushed + * without it. + */ + XLogSetAsyncXactLSN(recptr); + if (CurrRunningXacts->subxid_overflow) elog(DEBUG2, "snapshot of %d running transactions overflowed (lsn %X/%X oldest xid %u latest complete %u next xid %u)", @@ -1383,17 +1394,6 @@ LogCurrentRunningXacts(RunningTransactions CurrRunningXacts) CurrRunningXacts->latestCompletedXid, CurrRunningXacts->nextXid); - /* - * Ensure running_xacts information is synced to disk not too far in the - * future. We don't want to stall anything though (i.e. use XLogFlush()), - * so we let the wal writer do it during normal operation. - * XLogSetAsyncXactLSN() conveniently will mark the LSN as to-be-synced - * and nudge the WALWriter into action if sleeping. Check - * XLogBackgroundFlush() for details why a record might not be flushed - * without it. - */ - XLogSetAsyncXactLSN(recptr); - return recptr; } -- 2.34.1
Re: Synchronizing slots from primary to standby
Hi, On Fri, Apr 05, 2024 at 06:23:10PM +0530, Amit Kapila wrote: > On Fri, Apr 5, 2024 at 5:17 PM Amit Kapila wrote: > Thinking more on this, it doesn't seem related to > c9920a9068eac2e6c8fb34988d18c0b42b9bf811 as that commit doesn't change > any locking or something like that which impacts write positions. Agree. > I think what has happened here is that running_xact record written by > the background writer [1] is not written to the kernel or disk (see > LogStandbySnapshot()), before pg_current_wal_lsn() checks the > current_lsn to be compared with replayed LSN. Agree, I think it's not visible through pg_current_wal_lsn() yet. Also I think that the DEBUG message in LogCurrentRunningXacts() " elog(DEBUG2, "snapshot of %d+%d running transaction ids (lsn %X/%X oldest xid %u latest complete %u next xid %u)", CurrRunningXacts->xcnt, CurrRunningXacts->subxcnt, LSN_FORMAT_ARGS(recptr), CurrRunningXacts->oldestRunningXid, CurrRunningXacts->latestCompletedXid, CurrRunningXacts->nextXid); " should be located after the XLogSetAsyncXactLSN() call. Indeed, the new LSN is visible after the spinlock (XLogCtl->info_lck) in XLogSetAsyncXactLSN() is released, see: \watch on Session 1 provides: pg_current_wal_lsn 0/87D110 (1 row) Until: Breakpoint 2, XLogSetAsyncXactLSN (asyncXactLSN=8900936) at xlog.c:2579 2579XLogRecPtr WriteRqstPtr = asyncXactLSN; (gdb) n 2581boolwakeup = false; (gdb) 2584SpinLockAcquire(>info_lck); (gdb) 2585RefreshXLogWriteResult(LogwrtResult); (gdb) 2586sleeping = XLogCtl->WalWriterSleeping; (gdb) 2587prevAsyncXactLSN = XLogCtl->asyncXactLSN; (gdb) 2588if (XLogCtl->asyncXactLSN < asyncXactLSN) (gdb) 2589XLogCtl->asyncXactLSN = asyncXactLSN; (gdb) 2590SpinLockRelease(>info_lck); (gdb) p p/x (uint32) XLogCtl->asyncXactLSN $1 = 0x87d148 Then session 1 provides: pg_current_wal_lsn 0/87D148 (1 row) So, when we see in the log: 2024-04-05 04:37:05.074 UTC [3854278][background writer][:0] DEBUG: snapshot of 0+0 running transaction ids (lsn 0/398 oldest xid 740 latest complete 739 next xid 740) 2024-04-05 04:37:05.197 UTC [3866475][client backend][2/4:0] LOG: statement: SELECT '0/360' <= replay_lsn AND state = 'streaming' It's indeed possible that the new LSN was not visible yet (spinlock not released?) before the query began (because we can not rely on the time the DEBUG message has been emitted). > Note that the reason why > walsender has picked the running_xact written by background writer is > because it has checked after pg_current_wal_lsn() query, see LOGs [2]. > I think we can probably try to reproduce manually via debugger. > > If this theory is correct It think it is. > then I think we will need to use injection > points to control the behavior of bgwriter or use the slots created > via SQL API for syncing in tests. > > Thoughts? I think that maybe as a first step we should move the "elog(DEBUG2," message as proposed above to help debugging (that could help to confirm the above theory). If the theory is proven then I'm not sure we need the extra complexity of injection point here, maybe just relying on the slots created via SQL API could be enough. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Synchronizing slots from primary to standby
Hi, On Fri, Apr 05, 2024 at 04:09:01PM +0530, shveta malik wrote: > On Fri, Apr 5, 2024 at 10:09 AM Bertrand Drouvot > wrote: > > > > What about something like? > > > > ereport(LOG, > > errmsg("synchronized confirmed_flush_lsn for slot \"%s\" differs > > from remote slot", > > remote_slot->name), > > errdetail("Remote slot has LSN %X/%X but local slot has LSN %X/%X.", > > LSN_FORMAT_ARGS(remote_slot->restart_lsn), > > LSN_FORMAT_ARGS(slot->data.restart_lsn)); > > > > Regards, > > +1. Better than earlier. I will update and post the patch. > Thanks! BTW, I just realized that the LSN I used in my example in the LSN_FORMAT_ARGS() are not the right ones. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Introduce XID age and inactive timeout based replication slot invalidation
Hi, On Fri, Apr 05, 2024 at 11:21:43AM +0530, Bharath Rupireddy wrote: > On Wed, Apr 3, 2024 at 9:57 PM Bertrand Drouvot > wrote: > Please find the attached v36 patch. Thanks! A few comments: 1 === + +The timeout is measured from the time since the slot has become +inactive (known from its +inactive_since value) until it gets +used (i.e., its active is set to true). + That's right except when it's invalidated during the checkpoint (as the slot is not acquired in CheckPointReplicationSlots()). So, what about adding: "or a checkpoint occurs"? That would also explain that the invalidation could occur during checkpoint. 2 === + /* If the slot has been invalidated, recalculate the resource limits */ + if (invalidated) + { /If the slot/If a slot/? 3 === + * NB - this function also runs as part of checkpoint, so avoid raising errors s/NB - this/NB: This function/? (that looks more consistent with other comments in the code) 4 === + * Note that having a new function for RS_INVAL_INACTIVE_TIMEOUT cause instead I understand it's "the RS_INVAL_INACTIVE_TIMEOUT cause" but reading "cause instead" looks weird to me. Maybe it would make sense to reword this a bit. 5 === +* considered not active as they don't actually perform logical decoding. Not sure that's 100% accurate as we switched in fast forward logical in 2ec005b4e2. "as they perform only fast forward logical decoding (or not at all)", maybe? 6 === + if (RecoveryInProgress() && slot->data.synced) + return false; + + if (replication_slot_inactive_timeout == 0) + return false; What about just using one if? It's more a matter of taste but it also probably reduces the object file size a bit for non optimized build. 7 === + /* +* Do not invalidate the slots which are currently being synced from +* the primary to the standby. +*/ + if (RecoveryInProgress() && slot->data.synced) + return false; I think we don't need this check as the exact same one is done just before. 8 === +sub check_for_slot_invalidation_in_server_log +{ + my ($node, $slot_name, $offset) = @_; + my $invalidated = 0; + + for (my $i = 0; $i < 10 * $PostgreSQL::Test::Utils::timeout_default; $i++) + { + $node->safe_psql('postgres', "CHECKPOINT"); Wouldn't be better to wait for the replication_slot_inactive_timeout time before instead of triggering all those checkpoints? (it could be passed as an extra arg to wait_for_slot_invalidation()). 9 === # Synced slot mustn't get invalidated on the standby, it must sync invalidation # from the primary. So, we must not see the slot's invalidation message in server # log. ok( !$standby1->log_contains( "invalidating obsolete replication slot \"lsub1_sync_slot\"", $standby1_logstart), 'check that syned slot has not been invalidated on the standby'); Would that make sense to trigger a checkpoint on the standby before this test? I mean I think that without a checkpoint on the standby we should not see the invalidation in the log anyway. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Synchronizing slots from primary to standby
Hi, On Fri, Apr 05, 2024 at 09:43:35AM +0530, shveta malik wrote: > On Fri, Apr 5, 2024 at 9:22 AM Bertrand Drouvot > wrote: > > > > Hi, > > > > On Thu, Apr 04, 2024 at 05:31:45PM +0530, shveta malik wrote: > > > On Thu, Apr 4, 2024 at 2:59 PM shveta malik > > > wrote: > > 2 === > > > > + if (slot->data.confirmed_flush != > > remote_slot->confirmed_lsn) > > + elog(LOG, > > +"could not synchronize local slot > > \"%s\" LSN(%X/%X)" > > +" to remote slot's LSN(%X/%X) ", > > +remote_slot->name, > > + > > LSN_FORMAT_ARGS(slot->data.confirmed_flush), > > + > > LSN_FORMAT_ARGS(remote_slot->confirmed_lsn)); > > > > I don't think that the message is correct here. Unless I am missing > > something > > there is nothing in the following code path that would prevent the slot to > > be > > sync during this cycle. > > This is a sanity check, I will put a comment to indicate the same. Thanks! > We > want to ensure if anything changes in future, we get correct logs to > indicate that. Right, understood that way. > If required, the LOG msg can be changed. Kindly suggest if you have > anything better in mind. > What about something like? ereport(LOG, errmsg("synchronized confirmed_flush_lsn for slot \"%s\" differs from remote slot", remote_slot->name), errdetail("Remote slot has LSN %X/%X but local slot has LSN %X/%X.", LSN_FORMAT_ARGS(remote_slot->restart_lsn), LSN_FORMAT_ARGS(slot->data.restart_lsn)); Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Synchronizing slots from primary to standby
Hi, On Thu, Apr 04, 2024 at 05:31:45PM +0530, shveta malik wrote: > On Thu, Apr 4, 2024 at 2:59 PM shveta malik wrote: > > > > > > Prior to commit 2ec005b, this check was okay, as we did not expect > > restart_lsn of the synced slot to be ahead of remote since we were > > directly copying the lsns. But now when we use 'advance' to do logical > > decoding on standby, there is a possibility that restart lsn of the > > synced slot is ahead of remote slot, if there are running txns records > > found after reaching consistent-point while consuming WALs from > > restart_lsn till confirmed_lsn. In such a case, slot-sync's advance > > may end up serializing snapshots and setting restart_lsn to the > > serialized snapshot point, ahead of remote one. > > > > Fix: > > The sanity check needs to be corrected. Attached a patch to address the > > issue. > Thanks for reporting, explaining the issue and providing a patch. Regarding the patch: 1 === +* Attempt to sync lsns and xmins only if remote slot is ahead of local s/lsns/LSNs/? 2 === + if (slot->data.confirmed_flush != remote_slot->confirmed_lsn) + elog(LOG, +"could not synchronize local slot \"%s\" LSN(%X/%X)" +" to remote slot's LSN(%X/%X) ", +remote_slot->name, + LSN_FORMAT_ARGS(slot->data.confirmed_flush), + LSN_FORMAT_ARGS(remote_slot->confirmed_lsn)); I don't think that the message is correct here. Unless I am missing something there is nothing in the following code path that would prevent the slot to be sync during this cycle. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Autogenerate some wait events code and documentation
Hi, On Thu, Apr 04, 2024 at 07:14:47PM +0900, Michael Paquier wrote: > On Thu, Apr 04, 2024 at 09:28:36AM +0000, Bertrand Drouvot wrote: > > +# No "Backpatch" region here as code is generated automatically. > > > > What about "region here as has its own C code" (that would be more > > consistent > > with the comment in the "header" for the file). Done that way in v4. > > I'd add a "as -this section- has its own C code", for clarity. This > just looked a bit strange here. Sounds good, done that way in v5 attached. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com >From 1e24cdb02360147363495122fbfe79c2758ae4e8 Mon Sep 17 00:00:00 2001 From: Michael Paquier Date: Thu, 4 Apr 2024 15:34:37 +0900 Subject: [PATCH v5] Add "ABI_compatibility" regions in wait_event_names.txt When backpatching, adding an event will renumber others, which can make an extension report the wrong event until recompiled. This is due to the fact that generate-wait_event_types.pl sorts events to position them. The "ABI_compatibility" region added here ensures no ordering is done for the wait events found after this delimiter. --- .../activity/generate-wait_event_types.pl | 27 ++- .../utils/activity/wait_event_names.txt | 17 2 files changed, 43 insertions(+), 1 deletion(-) 100.0% src/backend/utils/activity/ diff --git a/src/backend/utils/activity/generate-wait_event_types.pl b/src/backend/utils/activity/generate-wait_event_types.pl index f1adf0e8e7..9768406db5 100644 --- a/src/backend/utils/activity/generate-wait_event_types.pl +++ b/src/backend/utils/activity/generate-wait_event_types.pl @@ -38,7 +38,9 @@ die "Not possible to specify --docs and --code simultaneously" open my $wait_event_names, '<', $ARGV[0] or die; +my @abi_compatibility_lines; my @lines; +my $abi_compatibility = 0; my $section_name; my $note; my $note_name; @@ -59,10 +61,27 @@ while (<$wait_event_names>) { $section_name = $_; $section_name =~ s/^.*- //; + $abi_compatibility = 0; next; } - push(@lines, $section_name . "\t" . $_); + # ABI_compatibility region, preserving ABI compatibility of the code + # generated. Any wait events listed in this part of the file + # will not be sorted during the code generation. + if (/^ABI_compatibility:$/) + { + $abi_compatibility = 1; + next; + } + + if ($gen_code && $abi_compatibility) + { + push(@abi_compatibility_lines, $section_name . "\t" . $_); + } + else + { + push(@lines, $section_name . "\t" . $_); + } } # Sort the lines based on the second column. @@ -70,6 +89,12 @@ while (<$wait_event_names>) my @lines_sorted = sort { uc((split(/\t/, $a))[1]) cmp uc((split(/\t/, $b))[1]) } @lines; +# If we are generating code, concat @lines_sorted and then @abi_compatibility_lines. +if ($gen_code) +{ + push(@lines_sorted, @abi_compatibility_lines); +} + # Read the sorted lines and populate the hash table foreach my $line (@lines_sorted) { diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt index 0d288d6b3d..5169be238c 100644 --- a/src/backend/utils/activity/wait_event_names.txt +++ b/src/backend/utils/activity/wait_event_names.txt @@ -26,6 +26,12 @@ # When adding a new wait event, make sure it is placed in the appropriate # ClassName section. # +# Wait events added in stable branches should be appended to the lists in +# the "ABI_compatibility:" region of their related ClassName section to preserve +# ABI compatibility of the C code generated from this file's data, respecting +# the order of any wait event already listed there. The "ABI_compatibility:" +# regions should remain empty on the master branch and on unreleased branches. +# # WaitEventLWLock and WaitEventLock have their own C code for their wait event # enums and function names. Hence, for these, only the SGML documentation is # generated. @@ -61,6 +67,7 @@ WAL_SENDER_MAIN "Waiting in main loop of WAL sender process." WAL_SUMMARIZER_WAL "Waiting in WAL summarizer for more WAL to be generated." WAL_WRITER_MAIN "Waiting in main loop of WAL writer process." +ABI_compatibility: # # Wait Events - Client @@ -83,6 +90,7 @@ WAIT_FOR_WAL_REPLAY "Waiting for a replay of the particular WAL position on the WAL_SENDER_WAIT_FOR_WAL "Waiting for WAL to be flushed in WAL sender process." WAL_SENDER_WRITE_DATA "Waiting for any activity when processing replies from WAL receiver in WAL sender process." +ABI_compatibility: # # Wait Events - IPC @@ -150,6 +158,7 @@ WAL_RECEIVER_WAIT_START "Waiting for startup process to send initial data for st WAL_SUMMARY_READY "Waiting for a new WAL summary to b
Re: Autogenerate some wait events code and documentation
Hi, On Thu, Apr 04, 2024 at 03:50:21PM +0900, Michael Paquier wrote: > On Tue, Mar 19, 2024 at 07:34:09AM +0000, Bertrand Drouvot wrote: > > I'm not sure as v2 used the "version >= 17" wording which I think would not > > need > > manual refresh each time a new stable branch is forked. > > > > But to avoid any doubt, I'm following your recommendation in v3 attached > > (then > > only mentioning the "master branch" and "any other branch"). > > I don't see why we could not be more generic, TBH. Note that the > Backpatch region should be empty not only the master branch but also > on stable and unreleased branches (aka REL_XX_STABLE branches from > their fork from master to their .0 release). I have reworded the > whole, mentioning ABI compatibility, as well. Yeah, agree. I do prefer your wording. > The position of the Backpatch regions were a bit incorrect (extra one > in LWLock, and the one in Lock was not needed). oops, thanks for the fixes! > We could be stricter with the order of the elements in > pgstat_wait_event.c and wait_event_funcs_data.c, but there's no > consequence feature-wise and I cannot get excited about the extra > complexity this creates in generate-wait_event_types.pl between the > enum generation and the rest. Yeah, and I think generate-wait_event_types.pl is already complex enough. So better to add only the strict necessary in it IMHO. > Is "Backpatch" the best choice we have, though? It speaks by itself > but I was thinking about something different, like "Stable". Other > ideas or objections are welcome. My naming sense is usually not that > good, so there's that. I think "Stable" is more confusing because the section should also be empty until the .0 is released. That said, what about "ABI_compatibility"? (that would also match the comment added in wait_event_names.txt). Attached v4 making use of the ABI_compatibility proposal. > 0001 is the patch with my tweaks. Thanks! +# No "Backpatch" region here as code is generated automatically. What about "region here as has its own C code" (that would be more consistent with the comment in the "header" for the file). Done that way in v4. It looks like WAL_SENDER_WRITE_ZZZ was also added in it (I guess for testing purpose, so I removed it in v4). Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com >From 4a69b8adb9bef2a5c7f7bec061d0bdda95ad9e24 Mon Sep 17 00:00:00 2001 From: Michael Paquier Date: Thu, 4 Apr 2024 15:34:37 +0900 Subject: [PATCH v4] Add "ABI_compatibility" regions in wait_event_names.txt When backpatching, adding an event will renumber others, which can make an extension report the wrong event until recompiled. This is due to the fact that generate-wait_event_types.pl sorts events to position them. The "ABI_compatibility" region added here ensures no ordering is done for the wait events found after this delimiter. --- .../activity/generate-wait_event_types.pl | 27 ++- .../utils/activity/wait_event_names.txt | 17 2 files changed, 43 insertions(+), 1 deletion(-) 100.0% src/backend/utils/activity/ diff --git a/src/backend/utils/activity/generate-wait_event_types.pl b/src/backend/utils/activity/generate-wait_event_types.pl index f1adf0e8e7..9768406db5 100644 --- a/src/backend/utils/activity/generate-wait_event_types.pl +++ b/src/backend/utils/activity/generate-wait_event_types.pl @@ -38,7 +38,9 @@ die "Not possible to specify --docs and --code simultaneously" open my $wait_event_names, '<', $ARGV[0] or die; +my @abi_compatibility_lines; my @lines; +my $abi_compatibility = 0; my $section_name; my $note; my $note_name; @@ -59,10 +61,27 @@ while (<$wait_event_names>) { $section_name = $_; $section_name =~ s/^.*- //; + $abi_compatibility = 0; next; } - push(@lines, $section_name . "\t" . $_); + # ABI_compatibility region, preserving ABI compatibility of the code + # generated. Any wait events listed in this part of the file + # will not be sorted during the code generation. + if (/^ABI_compatibility:$/) + { + $abi_compatibility = 1; + next; + } + + if ($gen_code && $abi_compatibility) + { + push(@abi_compatibility_lines, $section_name . "\t" . $_); + } + else + { + push(@lines, $section_name . "\t" . $_); + } } # Sort the lines based on the second column. @@ -70,6 +89,12 @@ while (<$wait_event_names>) my @lines_sorted = sort { uc((split(/\t/, $a))[1]) cmp uc((split(/\t/, $b))[1]) } @lines; +# If we are generating code, concat @lines_sorted and then @abi_compatibility_lines. +if ($gen_code) +{ + push(@lines_sorted, @abi_compatibility_lines); +} + # Read the
Re: Introduce XID age and inactive timeout based replication slot invalidation
Hi, On Wed, Apr 03, 2024 at 08:28:04PM +0530, Bharath Rupireddy wrote: > On Wed, Apr 3, 2024 at 6:46 PM Bertrand Drouvot > wrote: > > > > Just one comment on v32-0001: > > > > +# Synced slot on the standby must get its own inactive_since. > > +is( $standby1->safe_psql( > > + 'postgres', > > + "SELECT '$inactive_since_on_primary'::timestamptz <= > > '$inactive_since_on_standby'::timestamptz AND > > + '$inactive_since_on_standby'::timestamptz <= > > '$slot_sync_time'::timestamptz;" > > + ), > > + "t", > > + 'synchronized slot has got its own inactive_since'); > > + > > > > By using <= we are not testing that it must get its own inactive_since (as > > we > > allow them to be equal in the test). I think we should just add some > > usleep() > > where appropriate and deny equality during the tests on inactive_since. > > > Except for the above, v32-0001 LGTM. > > Thanks. Please see the attached v33-0001 patch after removing equality > on inactive_since TAP tests. Thanks! v33-0001 LGTM. > On Wed, Apr 3, 2024 at 1:47 PM Bertrand Drouvot > wrote: > > Some comments regarding v31-0002: > > > > T2 === > > > > In case the slot is invalidated on the primary, > > > > primary: > > > > postgres=# select slot_name, inactive_since, invalidation_reason from > > pg_replication_slots where slot_name = 's1'; > > slot_name |inactive_since | invalidation_reason > > ---+---+- > > s1| 2024-04-03 06:56:28.075637+00 | inactive_timeout > > > > then on the standby we get: > > > > standby: > > > > postgres=# select slot_name, inactive_since, invalidation_reason from > > pg_replication_slots where slot_name = 's1'; > > slot_name |inactive_since| invalidation_reason > > ---+--+- > > s1| 2024-04-03 07:06:43.37486+00 | inactive_timeout > > > > shouldn't the slot be dropped/recreated instead of updating inactive_since? > > The sync slots that are invalidated on the primary aren't dropped and > recreated on the standby. Yeah, right (I was confused with synced slots that are invalidated locally). > However, I > found that the synced slot is acquired and released unnecessarily > after the invalidation_reason is synced from the primary. I added a > skip check in synchronize_one_slot to skip acquiring and releasing the > slot if it's locally found inactive. With this, inactive_since won't > get updated for invalidated sync slots on the standby as we don't > acquire and release the slot. CR1 === Yeah, I can see: @@ -575,6 +575,13 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid) " name slot \"%s\" already exists on the standby", remote_slot->name)); + /* +* Skip the sync if the local slot is already invalidated. We do this +* beforehand to save on slot acquire and release. +*/ + if (slot->data.invalidated != RS_INVAL_NONE) + return false; Thanks to the drop_local_obsolete_slots() call I think we are not missing the case where the slot has been invalidated on the primary, invalidation reason has been synced on the standby and later the slot is dropped/ recreated manually on the primary (then it should be dropped/recreated on the standby too). Also it seems we are not missing the case where a sync slot is invalidated locally due to wal removal (it should be dropped/recreated). > > > CR5 === > > > > + /* > > +* This function isn't expected to be called for inactive timeout > > based > > +* invalidation. A separate function > > InvalidateInactiveReplicationSlot is > > + * to be used for that. > > > > Do you think it's worth to explain why? > > Hm, I just wanted to point out the actual function here. I modified it > to something like the following, if others feel we don't need that, I > can remove it. Sorry If I was not clear but I meant to say "Do you think it's worth to explain why we decided to create a dedicated function"? (currently we "just" explain that we created one). Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Introduce XID age and inactive timeout based replication slot invalidation
Hi, On Wed, Apr 03, 2024 at 05:12:12PM +0530, Bharath Rupireddy wrote: > On Wed, Apr 3, 2024 at 4:19 PM Amit Kapila wrote: > > > > + 'postgres', > > + "SELECT '$inactive_since_on_primary'::timestamptz < > > '$inactive_since_on_standby'::timestamptz AND > > + '$inactive_since_on_standby'::timestamptz < > > '$slot_sync_time'::timestamptz;" > > > > Shall we do <= check as we are doing in the main function > > get_slot_inactive_since_value as the time duration is less so it can > > be the same as well? Similarly, please check other tests. > > I get you. If the tests are so fast that losing a bit of precision > might cause tests to fail. So, I'v added equality check for all the > tests. > Please find the attached v32-0001 patch with the above review comments > addressed. Thanks! Just one comment on v32-0001: +# Synced slot on the standby must get its own inactive_since. +is( $standby1->safe_psql( + 'postgres', + "SELECT '$inactive_since_on_primary'::timestamptz <= '$inactive_since_on_standby'::timestamptz AND + '$inactive_since_on_standby'::timestamptz <= '$slot_sync_time'::timestamptz;" + ), + "t", + 'synchronized slot has got its own inactive_since'); + By using <= we are not testing that it must get its own inactive_since (as we allow them to be equal in the test). I think we should just add some usleep() where appropriate and deny equality during the tests on inactive_since. Except for the above, v32-0001 LGTM. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Introduce XID age and inactive timeout based replication slot invalidation
Hi, On Wed, Apr 03, 2024 at 11:17:41AM +0530, Bharath Rupireddy wrote: > On Wed, Apr 3, 2024 at 8:38 AM shveta malik wrote: > > > > > > Or a simple solution is that the slotsync worker updates > > > > inactive_since as it does for non-synced slots, and disables > > > > timeout-based slot invalidation for synced slots. > > > > I like this idea better, it takes care of such a case too when the > > user is relying on sync-function rather than worker and does not want > > to get the slots invalidated in between 2 sync function calls. > > Please find the attached v31 patches implementing the above idea: Thanks! Some comments regarding v31-0002: === testing the behavior T1 === > - synced slots don't get invalidated due to inactive timeout because > such slots not considered active at all as they don't perform logical > decoding (of course, they will perform in fast_forward mode to fix the > other data loss issue, but they don't generate changes for them to be > called as *active* slots) It behaves as described. OTOH non synced logical slots on the standby and physical slots on the standby are invalidated which is what is expected. T2 === In case the slot is invalidated on the primary, primary: postgres=# select slot_name, inactive_since, invalidation_reason from pg_replication_slots where slot_name = 's1'; slot_name |inactive_since | invalidation_reason ---+---+- s1| 2024-04-03 06:56:28.075637+00 | inactive_timeout then on the standby we get: standby: postgres=# select slot_name, inactive_since, invalidation_reason from pg_replication_slots where slot_name = 's1'; slot_name |inactive_since| invalidation_reason ---+--+- s1| 2024-04-03 07:06:43.37486+00 | inactive_timeout shouldn't the slot be dropped/recreated instead of updating inactive_since? === code CR1 === +Invalidates replication slots that are inactive for longer the +specified amount of time s/for longer the/for longer that/? CR2 === +true) as such synced slots don't actually perform +logical decoding. We're switching in fast forward logical due to [1], so I'm not sure that's 100% accurate here. I'm not sure we need to specify a reason. CR3 === + errdetail("This slot has been invalidated because it was inactive for more than the time specified by replication_slot_inactive_timeout parameter."))); I think we can remove "parameter" (see for example the error message in validate_remote_info()) and reduce it a bit, something like? "This slot has been invalidated because it was inactive for more than replication_slot_inactive_timeout"? CR4 === + appendStringInfoString(_detail, _("The slot has been inactive for more than the time specified by replication_slot_inactive_timeout parameter.")); Same. CR5 === + /* +* This function isn't expected to be called for inactive timeout based +* invalidation. A separate function InvalidateInactiveReplicationSlot is +* to be used for that. Do you think it's worth to explain why? CR6 === + if (replication_slot_inactive_timeout == 0) + return false; + else if (slot->inactive_since > 0) "else" is not needed here. CR7 === + SpinLockAcquire(>mutex); + + /* +* Check if the slot needs to be invalidated due to +* replication_slot_inactive_timeout GUC. We do this with the spinlock +* held to avoid race conditions -- for example the inactive_since +* could change, or the slot could be dropped. +*/ + now = GetCurrentTimestamp(); We should not call GetCurrentTimestamp() while holding a spinlock. CR8 === +# Testcase start: Invalidate streaming standby's slot as well as logical +# failover slot on primary due to inactive timeout GUC. Also, check the logical s/inactive timeout GUC/replication_slot_inactive_timeout/? CR9 === +# Start: Helper functions used for this test file +# End: Helper functions used for this test file I think that's the first TAP test with this comment. Not saying we should not but why did you feel the need to add those? [1]: https://www.postgresql.org/message-id/os0pr01mb5716b3942ae49f3f725aca9294...@os0pr01mb5716.jpnprd01.prod.outlook.com Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Introduce XID age and inactive timeout based replication slot invalidation
Hi, On Wed, Apr 03, 2024 at 11:17:41AM +0530, Bharath Rupireddy wrote: > On Wed, Apr 3, 2024 at 8:38 AM shveta malik wrote: > > > > > > Or a simple solution is that the slotsync worker updates > > > > inactive_since as it does for non-synced slots, and disables > > > > timeout-based slot invalidation for synced slots. > > > > I like this idea better, it takes care of such a case too when the > > user is relying on sync-function rather than worker and does not want > > to get the slots invalidated in between 2 sync function calls. > > Please find the attached v31 patches implementing the above idea: Thanks! Some comments related to v31-0001: === testing the behavior T1 === > - synced slots get their on inactive_since just like any other slot It behaves as described. T2 === > - synced slots inactive_since is set to current timestamp after the > standby gets promoted to help inactive_since interpret correctly just > like any other slot. It behaves as described. CR1 === +inactive_since value will get updated +after every synchronization indicates the last synchronization time? (I think that after every synchronization could lead to confusion). CR2 === + /* +* Set the time since the slot has become inactive after shutting +* down slot sync machinery. This helps correctly interpret the +* time if the standby gets promoted without a restart. +*/ It looks to me that this comment is not at the right place because there is nothing after the comment that indicates that we shutdown the "slot sync machinery". Maybe a better place is before the function definition and mention that this is currently called when we shutdown the "slot sync machinery"? CR3 === +* We get the current time beforehand and only once to avoid +* system calls overhead while holding the lock. s/avoid system calls overhead while holding the lock/avoid system calls while holding the spinlock/? CR4 === +* Set the time since the slot has become inactive. We get the current +* time beforehand to avoid system call overhead while holding the lock Same. CR5 === + # Check that the captured time is sane + if (defined $reference_time) + { s/Check that the captured time is sane/Check that the inactive_since is sane/? Sorry if some of those comments could have been done while I did review v29-0001. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Synchronizing slots from primary to standby
Hi, On Tue, Apr 02, 2024 at 02:19:30PM +0530, Amit Kapila wrote: > On Tue, Apr 2, 2024 at 1:54 PM Bertrand Drouvot > wrote: > > What about adding a "wait" injection point in LogStandbySnapshot() to > > prevent > > checkpointer/bgwriter to log a standby snapshot? Something among those > > lines: > > > >if (AmCheckpointerProcess() || AmBackgroundWriterProcess()) > >INJECTION_POINT("bgw-log-standby-snapshot"); > > > > And make use of it in the test, something like: > > > >$node_primary->safe_psql('postgres', > >"SELECT injection_points_attach('bgw-log-standby-snapshot', > > 'wait');"); > > > > Sometimes we want the checkpoint to log the standby snapshot as we > need it at a predictable time, maybe one can use > pg_log_standby_snapshot() instead of that. Can we add an injection > point as a separate patch/commit after a bit more discussion? Sure, let's come back to this injection point discussion after the feature freeze. BTW, I think it could also be useful to make use of injection point for the test that has been added in 7f13ac8123. I'll open a new thread for this at that time. >. One other > idea to make such tests predictable is to add a developer-specific GUC > say debug_bg_log_standby_snapshot or something like that but injection > point sounds like a better idea. Agree. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Introduce XID age and inactive timeout based replication slot invalidation
Hi, On Tue, Apr 02, 2024 at 12:41:35PM +0530, Bharath Rupireddy wrote: > On Tue, Apr 2, 2024 at 11:58 AM Bertrand Drouvot > wrote: > > > > > Or a simple solution is that the slotsync worker updates > > > inactive_since as it does for non-synced slots, and disables > > > timeout-based slot invalidation for synced slots. > > > > Yeah, I think the main question to help us decide is: do we want to > > invalidate > > "inactive" synced slots locally (in addition to synchronizing the > > invalidation > > from the primary)? > > I think this approach looks way simpler than the other one. The other > approach of linking inactive_since on the standby for synced slots to > the actual LSNs (or other slot parameters) being updated or not looks > more complicated, and might not go well with the end user. However, > we need to be able to say why we don't invalidate synced slots due to > inactive timeout unlike the wal_removed invalidation that can happen > right now on the standby for synced slots. This leads us to define > actually what a slot being active means. Is syncing the data from the > remote slot considered as the slot being active? > > On the other hand, it may not sound great if we don't invalidate > synced slots due to inactive timeout even though they hold resources > such as WAL and XIDs. Right and the "only" benefit then would be to give an idea as to when the last sync did occur on the local slot. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Synchronizing slots from primary to standby
Hi, On Tue, Apr 02, 2024 at 07:20:46AM +, Zhijie Hou (Fujitsu) wrote: > I added one test in 040_standby_failover_slots_sync.pl in 0002 patch, which > can > reproduce the data loss issue consistently on my machine. Thanks! > It may not reproduce > in some rare cases if concurrent xl_running_xacts are written by bgwriter, but > I think it's still valuable if it can verify the fix in most cases. What about adding a "wait" injection point in LogStandbySnapshot() to prevent checkpointer/bgwriter to log a standby snapshot? Something among those lines: if (AmCheckpointerProcess() || AmBackgroundWriterProcess()) INJECTION_POINT("bgw-log-standby-snapshot"); And make use of it in the test, something like: $node_primary->safe_psql('postgres', "SELECT injection_points_attach('bgw-log-standby-snapshot', 'wait');"); Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Introduce XID age and inactive timeout based replication slot invalidation
Hi, On Tue, Apr 02, 2024 at 12:07:54PM +0900, Masahiko Sawada wrote: > On Mon, Apr 1, 2024 at 12:18 PM Bharath Rupireddy > > FWIW, coming to this thread late, I think that the inactive_since > should not be synchronized from the primary. The wall clocks are > different on the primary and the standby so having the primary's > timestamp on the standby can confuse users, especially when there is a > big clock drift. Also, as Amit mentioned, inactive_since seems not to > be consistent with other parameters we copy. The > replication_slot_inactive_timeout feature should work on the standby > independent from the primary, like other slot invalidation mechanisms, > and it should be based on its own local clock. Thanks for sharing your thoughts! So, it looks like that most of us agree to not sync inactive_since from the primary, I'm fine with that. > If we want to invalidate the synced slots due to the timeout, I think > we need to define what is "inactive" for synced slots. > > Suppose that the slotsync worker updates the local (synced) slot's > inactive_since whenever releasing the slot, irrespective of the actual > LSNs (or other slot parameters) having been updated. I think that this > idea cannot handle a slot that is not acquired on the primary. In this > case, the remote slot is inactive but the local slot is regarded as > active. WAL files are piled up on the standby (and on the primary) as > the slot's LSNs don't move forward. I think we want to regard such a > slot as "inactive" also on the standby and invalidate it because of > the timeout. I think that makes sense to somehow link inactive_since on the standby to the actual LSNs (or other slot parameters) being updated or not. > > > Now, the other concern is that calling GetCurrentTimestamp() > > > could be costly when the values for the slot are not going to be > > > updated but if that happens we can optimize such that before acquiring > > > the slot we can have some minimal pre-checks to ensure whether we need > > > to update the slot or not. > > If we use such pre-checks, another problem might happen; it cannot > handle a case where the slot is acquired on the primary but its LSNs > don't move forward. Imagine a logical replication conflict happened on > the subscriber, and the logical replication enters the retry loop. In > this case, the remote slot's inactive_since gets updated for every > retry, but it looks inactive from the standby since the slot LSNs > don't change. Therefore, only the local slot could be invalidated due > to the timeout but probably we don't want to regard such a slot as > "inactive". > > Another idea I came up with is that the slotsync worker updates the > local slot's inactive_since to the local timestamp only when the > remote slot might have got inactive. If the remote slot is acquired by > someone, the local slot's inactive_since is also NULL. If the remote > slot gets inactive, the slotsync worker sets the local timestamp to > the local slot's inactive_since. Since the remote slot could be > acquired and released before the slotsync worker gets the remote slot > data again, if the remote slot's inactive_since > the local slot's > inactive_since, the slotsync worker updates the local one. Then I think we would need to be careful about time zone comparison. > IOW, we > detect whether the remote slot was acquired and released since the > last synchronization, by checking the remote slot's inactive_since. > This idea seems to handle these cases I mentioned unless I'm missing > something, but it requires for the slotsync worker to update > inactive_since in a different way than other parameters. > > Or a simple solution is that the slotsync worker updates > inactive_since as it does for non-synced slots, and disables > timeout-based slot invalidation for synced slots. Yeah, I think the main question to help us decide is: do we want to invalidate "inactive" synced slots locally (in addition to synchronizing the invalidation from the primary)? Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Synchronizing slots from primary to standby
Hi, On Tue, Apr 02, 2024 at 04:24:49AM +, Zhijie Hou (Fujitsu) wrote: > On Monday, April 1, 2024 9:28 PM Bertrand Drouvot > wrote: > > > > On Mon, Apr 01, 2024 at 05:04:53PM +0530, Amit Kapila wrote: > > > On Mon, Apr 1, 2024 at 2:51 PM Bertrand Drouvot > > > > > > > > > 2 === > > > > > > > > + { > > > > + if (SnapBuildSnapshotExists(remote_slot->restart_lsn)) > > > > + { > > > > > > > > That could call SnapBuildSnapshotExists() multiple times for the > > > > same "restart_lsn" (for example in case of multiple remote slots to > > > > sync). > > > > > > > > What if the sync worker records the last lsn it asks for > > > > serialization (and serialized ? Then we could check that value first > > > > before deciding to call (or not) > > > > SnapBuildSnapshotExists() on it? > > > > > > > > It's not ideal because it would record "only the last one" but that > > > > would be simple enough for now (currently there is only one sync > > > > worker so that scenario is likely to happen). > > > > > > > > > > Yeah, we could do that but I am not sure how much it can help. I guess > > > we could do some tests to see if it helps. > > > > Yeah not sure either. I just think it can only help and shouldn't make > > things > > worst (but could avoid extra SnapBuildSnapshotExists() calls). > > Thanks for the idea. I tried some tests based on Nisha's setup[1]. Thank you and Nisha and Shveta for the testing! > I tried to > advance the slots on the primary to the same restart_lsn before calling > sync_replication_slots(), and reduced the data generated by pgbench. Agree that this scenario makes sense to try to see the impact of SnapBuildSnapshotExists(). > The SnapBuildSnapshotExists is still not noticeable in the profile. SnapBuildSnapshotExists() number of calls are probably negligeable when compared to the IO calls generated by the fast forward logical decoding in this scenario. > So, I feel we > could leave this as a further improvement once we encounter scenarios where > the duplicate SnapBuildSnapshotExists call becomes noticeable. Sounds reasonable to me. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Synchronizing slots from primary to standby
Hi, On Mon, Apr 01, 2024 at 05:04:53PM +0530, Amit Kapila wrote: > On Mon, Apr 1, 2024 at 2:51 PM Bertrand Drouvot > wrote: > > Then there is no need to call WaitForStandbyConfirmation() as it could go > > until > > the RecoveryInProgress() in StandbySlotsHaveCaughtup() for nothing (as we > > already > > know it). > > > > Won't it will normally return from the first check in > WaitForStandbyConfirmation() because standby_slot_names_config is not > set on standby? I think standby_slot_names can be set on a standby. One could want to set it in a cascading standby env (though it won't have any real effects until the standby is promoted). > > > 2 === > > > > + { > > + if (SnapBuildSnapshotExists(remote_slot->restart_lsn)) > > + { > > > > That could call SnapBuildSnapshotExists() multiple times for the same > > "restart_lsn" (for example in case of multiple remote slots to sync). > > > > What if the sync worker records the last lsn it asks for serialization (and > > serialized ? Then we could check that value first before deciding to call > > (or not) > > SnapBuildSnapshotExists() on it? > > > > It's not ideal because it would record "only the last one" but that would be > > simple enough for now (currently there is only one sync worker so that > > scenario > > is likely to happen). > > > > Yeah, we could do that but I am not sure how much it can help. I guess > we could do some tests to see if it helps. Yeah not sure either. I just think it can only help and shouldn't make things worst (but could avoid extra SnapBuildSnapshotExists() calls). Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Introduce XID age and inactive timeout based replication slot invalidation
Hi, On Sun, Mar 31, 2024 at 10:25:46AM +0530, Bharath Rupireddy wrote: > On Thu, Mar 28, 2024 at 3:13 PM Bertrand Drouvot > wrote: > > I think in this case it should always reflect the value from the primary (so > > that one can understand why it is invalidated). > > I'll come back to this as soon as we all agree on inactive_since > behavior for synced slots. Makes sense. Also if the majority of us thinks it's not needed for inactive_since to be an exact copy of the primary, then let's go that way. > > I think when it is invalidated it should always reflect the value from the > > primary (so that one can understand why it is invalidated). > > I'll come back to this as soon as we all agree on inactive_since > behavior for synced slots. Yeah. > > T4 === > > > > Also, it looks like querying pg_replication_slots() does not trigger an > > invalidation: I think it should if the slot is not invalidated yet (and > > matches > > the invalidation criteria). > > There's a different opinion on this, check comment #3 from > https://www.postgresql.org/message-id/CAA4eK1LLj%2BeaMN-K8oeOjfG%2BUuzTY%3DL5PXbcMJURZbFm%2B_aJSA%40mail.gmail.com. Oh right, I can see Amit's point too. Let's put pg_replication_slots() out of the game then. > > CR6 === > > > > +static bool > > +InvalidateSlotForInactiveTimeout(ReplicationSlot *slot, bool need_locks) > > +{ > > > > InvalidatePossiblyInactiveSlot() maybe? > > I think we will lose the essence i.e. timeout from the suggested > function name, otherwise just the inactive doesn't give a clearer > meaning. I kept it that way unless anyone suggests otherwise. Right. OTOH I think that "Possibly" adds some nuance (like InvalidatePossiblyObsoleteSlot() is already doing). > Please see the attached v30 patch. 0002 is where all of the above > review comments have been addressed. Thanks! FYI, I did not look at the content yet, just replied to the above comments. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Synchronizing slots from primary to standby
Hi, On Mon, Apr 01, 2024 at 06:05:34AM +, Zhijie Hou (Fujitsu) wrote: > On Monday, April 1, 2024 8:56 AM Zhijie Hou (Fujitsu) > wrote: > Attach the V4 patch which includes the optimization to skip the decoding if > the snapshot at the syncing restart_lsn is already serialized. It can avoid > most > of the duplicate decoding in my test, and I am doing some more tests locally. > Thanks! 1 === Same comment as in [1]. In LogicalSlotAdvanceAndCheckReadynessForDecoding(), if we are synchronizing slots then I think that we can skip: + /* +* Wait for specified streaming replication standby servers (if any) +* to confirm receipt of WAL up to moveto lsn. +*/ + WaitForStandbyConfirmation(moveto); Indeed if we are dealing with synced slot then we know we're in RecoveryInProgress(). Then there is no need to call WaitForStandbyConfirmation() as it could go until the RecoveryInProgress() in StandbySlotsHaveCaughtup() for nothing (as we already know it). 2 === + { + if (SnapBuildSnapshotExists(remote_slot->restart_lsn)) + { That could call SnapBuildSnapshotExists() multiple times for the same "restart_lsn" (for example in case of multiple remote slots to sync). What if the sync worker records the last lsn it asks for serialization (and serialized ? Then we could check that value first before deciding to call (or not) SnapBuildSnapshotExists() on it? It's not ideal because it would record "only the last one" but that would be simple enough for now (currently there is only one sync worker so that scenario is likely to happen). Maybe an idea for future improvement (not for now) could be that SnapBuildSerialize() maintains a "small list" of "already serialized" snapshots. [1]: https://www.postgresql.org/message-id/ZgayTFIhLfzhpHci%40ip-10-97-1-34.eu-west-3.compute.internal Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Introduce XID age and inactive timeout based replication slot invalidation
e minimal pre-checks to ensure whether we need > > to update the slot or not. Also maybe we could accept a bit less accuracy and use GetCurrentTransactionStopTimestamp() instead? > If we are too much concerned about the cost of GetCurrentTimestamp(), > a possible approach is just don't set inactive_since for slots being > synced on the standby. > Just let the first acquisition and release > after the promotion do that job. We can always call this out in the > docs saying "replication slots on the streaming standbys which are > being synced from the primary are not inactive in practice, so the > inactive_since is always NULL for them unless the standby is > promoted". I think that was the initial behavior that lead to Robert's remark (see [2]): " And I'm suspicious that having an exception for slots being synced is a bad idea. That makes too much of a judgement about how the user will use this field. It's usually better to just expose the data, and if the user needs helps to make sense of that data, then give them that help separately. " [1]: https://www.postgresql.org/message-id/CAA4eK1JtKieWMivbswYg5FVVB5FugCftLvQKVsxh%3Dm_8nk04vw%40mail.gmail.com [2]: https://www.postgresql.org/message-id/CA%2BTgmob_Ta-t2ty8QrKHBGnNLrf4ZYcwhGHGFsuUoFrAEDw4sA%40mail.gmail.com Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Introduce XID age and inactive timeout based replication slot invalidation
Hi, On Mon, Apr 01, 2024 at 09:04:43AM +0530, Amit Kapila wrote: > On Fri, Mar 29, 2024 at 6:17 PM Bertrand Drouvot > wrote: > > > > On Fri, Mar 29, 2024 at 03:03:01PM +0530, Amit Kapila wrote: > > > On Fri, Mar 29, 2024 at 11:49 AM Bertrand Drouvot > > > wrote: > > > > > > > > On Fri, Mar 29, 2024 at 09:39:31AM +0530, Amit Kapila wrote: > > > > > > > > > > Commit message states: "why we can't just update inactive_since for > > > > > synced slots on the standby with the value received from remote slot > > > > > on the primary. This is consistent with any other slot parameter i.e. > > > > > all of them are synced from the primary." > > > > > > > > > > The inactive_since is not consistent with other slot parameters which > > > > > we copy. We don't perform anything related to those other parameters > > > > > like say two_phase phase which can change that property. However, we > > > > > do acquire the slot, advance the slot (as per recent discussion [1]), > > > > > and release it. Since these operations can impact inactive_since, it > > > > > seems to me that inactive_since is not the same as other parameters. > > > > > It can have a different value than the primary. Why would anyone want > > > > > to know the value of inactive_since from primary after the standby is > > > > > promoted? > > > > > > > > I think it can be useful "before" it is promoted and in case the > > > > primary is down. > > > > > > > > > > It is not clear to me what is user going to do by checking the > > > inactivity time for slots when the corresponding server is down. > > > > Say a failover needs to be done, then it could be useful to know for which > > slots the activity needs to be resumed (thinking about external logical > > decoding > > plugin, not about pub/sub here). If one see an inactive slot (since long > > "enough") > > then he can start to reasonate about what to do with it. > > > > > I thought the idea was to check such slots and see if they need to be > > > dropped or enabled again to avoid excessive disk usage, etc. > > > > Yeah that's the case but it does not mean inactive_since can't be useful in > > other > > ways. > > > > Also, say the slot has been invalidated on the primary (due to inactivity > > timeout), > > primary is down and there is a failover. By keeping the inactive_since from > > the primary, one could know when the inactivity that lead to the timeout > > started. > > > > So, this means at promotion, we won't set the current_time for > inactive_since which is not what the currently proposed patch is > doing. Yeah, that's why I made the comment T2 in [1]. > Moreover, doing the invalidation on promoted standby based on > inactive_since of the primary node sounds debatable because the > inactive_timeout could be different on the new node (promoted > standby). I think that if the slot is not invalidated before the promotion then we should erase the value from the primary and use the promotion time. > > Again, more concerned about external logical decoding plugin than pub/sub > > here. > > > > > > I agree that tracking the activity time of a synced slot can be useful, > > > > why > > > > not creating a dedicated field for that purpose (and keep > > > > inactive_since a > > > > perfect "copy" of the primary)? > > > > > > > > > > We can have a separate field for this but not sure if it is worth it. > > > > OTOH I'm not sure that erasing this information from the primary is useful. > > I > > think that 2 fields would be the best option and would be less subject of > > misinterpretation. > > > > > > > Now, the other concern is that calling GetCurrentTimestamp() > > > > > could be costly when the values for the slot are not going to be > > > > > updated but if that happens we can optimize such that before acquiring > > > > > the slot we can have some minimal pre-checks to ensure whether we need > > > > > to update the slot or not. > > > > > > > > Right, but for a very active slot it is likely that we call > > > > GetCurrentTimestamp() > > > > during almost each sync cycle. > > > > > > > > > > True, but if we have to save a slot to disk each time to pe
Re: Introduce XID age and inactive timeout based replication slot invalidation
Hi, On Fri, Mar 29, 2024 at 03:03:01PM +0530, Amit Kapila wrote: > On Fri, Mar 29, 2024 at 11:49 AM Bertrand Drouvot > wrote: > > > > On Fri, Mar 29, 2024 at 09:39:31AM +0530, Amit Kapila wrote: > > > > > > Commit message states: "why we can't just update inactive_since for > > > synced slots on the standby with the value received from remote slot > > > on the primary. This is consistent with any other slot parameter i.e. > > > all of them are synced from the primary." > > > > > > The inactive_since is not consistent with other slot parameters which > > > we copy. We don't perform anything related to those other parameters > > > like say two_phase phase which can change that property. However, we > > > do acquire the slot, advance the slot (as per recent discussion [1]), > > > and release it. Since these operations can impact inactive_since, it > > > seems to me that inactive_since is not the same as other parameters. > > > It can have a different value than the primary. Why would anyone want > > > to know the value of inactive_since from primary after the standby is > > > promoted? > > > > I think it can be useful "before" it is promoted and in case the primary is > > down. > > > > It is not clear to me what is user going to do by checking the > inactivity time for slots when the corresponding server is down. Say a failover needs to be done, then it could be useful to know for which slots the activity needs to be resumed (thinking about external logical decoding plugin, not about pub/sub here). If one see an inactive slot (since long "enough") then he can start to reasonate about what to do with it. > I thought the idea was to check such slots and see if they need to be > dropped or enabled again to avoid excessive disk usage, etc. Yeah that's the case but it does not mean inactive_since can't be useful in other ways. Also, say the slot has been invalidated on the primary (due to inactivity timeout), primary is down and there is a failover. By keeping the inactive_since from the primary, one could know when the inactivity that lead to the timeout started. Again, more concerned about external logical decoding plugin than pub/sub here. > > I agree that tracking the activity time of a synced slot can be useful, why > > not creating a dedicated field for that purpose (and keep inactive_since a > > perfect "copy" of the primary)? > > > > We can have a separate field for this but not sure if it is worth it. OTOH I'm not sure that erasing this information from the primary is useful. I think that 2 fields would be the best option and would be less subject of misinterpretation. > > > Now, the other concern is that calling GetCurrentTimestamp() > > > could be costly when the values for the slot are not going to be > > > updated but if that happens we can optimize such that before acquiring > > > the slot we can have some minimal pre-checks to ensure whether we need > > > to update the slot or not. > > > > Right, but for a very active slot it is likely that we call > > GetCurrentTimestamp() > > during almost each sync cycle. > > > > True, but if we have to save a slot to disk each time to persist the > changes (for an active slot) then probably GetCurrentTimestamp() > shouldn't be costly enough to matter. Right, persisting the changes to disk would be even more costly. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Synchronizing slots from primary to standby
Hi, On Fri, Mar 29, 2024 at 02:35:22PM +0530, Amit Kapila wrote: > On Fri, Mar 29, 2024 at 1:08 PM Bertrand Drouvot > wrote: > > > > On Fri, Mar 29, 2024 at 07:23:11AM +, Zhijie Hou (Fujitsu) wrote: > > > On Friday, March 29, 2024 2:48 PM Bertrand Drouvot > > > wrote: > > > > > > > > Hi, > > > > > > > > On Fri, Mar 29, 2024 at 01:06:15AM +, Zhijie Hou (Fujitsu) wrote: > > > > > Attach a new version patch which fixed an un-initialized variable > > > > > issue and added some comments. Also, temporarily enable DEBUG2 for the > > > > > 040 tap-test so that we can analyze the possible CFbot failures > > > > > easily. > > > > > > > > > > > > > Thanks! > > > > > > > > + if (remote_slot->confirmed_lsn != slot->data.confirmed_flush) > > > > + { > > > > + /* > > > > +* By advancing the restart_lsn, confirmed_lsn, and > > > > xmin using > > > > +* fast-forward logical decoding, we ensure that the > > > > required > > > > snapshots > > > > +* are saved to disk. This enables logical decoding to > > > > quickly > > > > reach a > > > > +* consistent point at the restart_lsn, eliminating the > > > > risk of > > > > missing > > > > +* data during snapshot creation. > > > > +*/ > > > > + > > > > pg_logical_replication_slot_advance(remote_slot->confirmed_lsn, > > > > + > > > > found_consistent_point); > > > > + ReplicationSlotsComputeRequiredLSN(); > > > > + updated_lsn = true; > > > > + } > > > > > > > > Instead of using pg_logical_replication_slot_advance() for each synced > > > > slot and > > > > during sync cycles what about?: > > > > > > > > - keep sync slot synchronization as it is currently (not using > > > > pg_logical_replication_slot_advance()) > > > > - create "an hidden" logical slot if sync slot feature is on > > > > - at the time of promotion use pg_logical_replication_slot_advance() on > > > > this > > > > hidden slot only to advance to the max lsn of the synced slots > > > > > > > > I'm not sure that would be enough, just asking your thoughts on this > > > > (benefits > > > > would be to avoid calling pg_logical_replication_slot_advance() on each > > > > sync > > > > slots and during the sync cycles). > > > > > > Thanks for the idea ! > > > > > > I considered about this. I think advancing the "hidden" slot on promotion > > > may be a > > > bit late, because if we cannot reach the consistent point after advancing > > > the > > > "hidden" slot, then it means we may need to remove all the synced slots > > > as we > > > are not sure if they are usable(will not loss data) after promotion. > > > > What about advancing the hidden slot during the sync cycles then? > > > > > The current approach is to mark such un-consistent slot as temp and > > > persist > > > them once it reaches consistent point, so that user can ensure the slot > > > can be > > > used after promotion once persisted. > > > > Right, but do we need to do so for all the sync slots? Would a single hidden > > slot be enough? > > > > Even if we mark one of the synced slots as persistent without reaching > a consistent state, it could create a problem after promotion. And, > how a single hidden slot would serve the purpose, different synced > slots will have different restart/confirmed_flush LSN and we won't be > able to perform advancing for those using a single slot. For example, > say for first synced slot, it has not reached a consistent state and > then how can it try for the second slot? This sounds quite tricky to > make work. We should go with something simple where the chances of > introducing bugs are lesser. Yeah, better to go with something simple. + if (remote_slot->confirmed_lsn != slot->data.confirmed_flush) + { + /* +* By advancing the restart_lsn, confirmed_lsn, and xmin using +* fast-forward logical decoding, we ensure that the required snapshots +* are saved to disk. This enables logical decoding to quickly reach a +* consistent point at the restart_lsn, eliminating the risk of missing +* data during snapshot creation. +*/ + pg_logical_replication_slot_advance(remote_slot->confirmed_lsn, + found_consistent_point); In our case, what about skipping WaitForStandbyConfirmation() in pg_logical_replication_slot_advance()? (It could go until the RecoveryInProgress() check in StandbySlotsHaveCaughtup() if we don't skip it). Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Synchronizing slots from primary to standby
Hi, On Fri, Mar 29, 2024 at 07:23:11AM +, Zhijie Hou (Fujitsu) wrote: > On Friday, March 29, 2024 2:48 PM Bertrand Drouvot > wrote: > > > > Hi, > > > > On Fri, Mar 29, 2024 at 01:06:15AM +, Zhijie Hou (Fujitsu) wrote: > > > Attach a new version patch which fixed an un-initialized variable > > > issue and added some comments. Also, temporarily enable DEBUG2 for the > > > 040 tap-test so that we can analyze the possible CFbot failures easily. > > > > > > > Thanks! > > > > + if (remote_slot->confirmed_lsn != slot->data.confirmed_flush) > > + { > > + /* > > +* By advancing the restart_lsn, confirmed_lsn, and xmin > > using > > +* fast-forward logical decoding, we ensure that the > > required > > snapshots > > +* are saved to disk. This enables logical decoding to > > quickly > > reach a > > +* consistent point at the restart_lsn, eliminating the > > risk of > > missing > > +* data during snapshot creation. > > +*/ > > + > > pg_logical_replication_slot_advance(remote_slot->confirmed_lsn, > > + > > found_consistent_point); > > + ReplicationSlotsComputeRequiredLSN(); > > + updated_lsn = true; > > + } > > > > Instead of using pg_logical_replication_slot_advance() for each synced slot > > and > > during sync cycles what about?: > > > > - keep sync slot synchronization as it is currently (not using > > pg_logical_replication_slot_advance()) > > - create "an hidden" logical slot if sync slot feature is on > > - at the time of promotion use pg_logical_replication_slot_advance() on this > > hidden slot only to advance to the max lsn of the synced slots > > > > I'm not sure that would be enough, just asking your thoughts on this > > (benefits > > would be to avoid calling pg_logical_replication_slot_advance() on each sync > > slots and during the sync cycles). > > Thanks for the idea ! > > I considered about this. I think advancing the "hidden" slot on promotion may > be a > bit late, because if we cannot reach the consistent point after advancing the > "hidden" slot, then it means we may need to remove all the synced slots as we > are not sure if they are usable(will not loss data) after promotion. What about advancing the hidden slot during the sync cycles then? > The current approach is to mark such un-consistent slot as temp and persist > them once it reaches consistent point, so that user can ensure the slot can be > used after promotion once persisted. Right, but do we need to do so for all the sync slots? Would a single hidden slot be enough? > Another optimization idea is to check the snapshot file existence before > calling the > slot_advance(). If the file already exists, we skip the decoding and directly > update the restart_lsn. This way, we could also avoid some duplicate decoding > work. Yeah, I think it's a good idea (even better if we can do this check without performing any I/O). Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Synchronizing slots from primary to standby
Hi, On Fri, Mar 29, 2024 at 01:06:15AM +, Zhijie Hou (Fujitsu) wrote: > Attach a new version patch which fixed an un-initialized variable issue and > added some comments. Also, temporarily enable DEBUG2 for the 040 tap-test so > that > we can analyze the possible CFbot failures easily. > Thanks! + if (remote_slot->confirmed_lsn != slot->data.confirmed_flush) + { + /* +* By advancing the restart_lsn, confirmed_lsn, and xmin using +* fast-forward logical decoding, we ensure that the required snapshots +* are saved to disk. This enables logical decoding to quickly reach a +* consistent point at the restart_lsn, eliminating the risk of missing +* data during snapshot creation. +*/ + pg_logical_replication_slot_advance(remote_slot->confirmed_lsn, + found_consistent_point); + ReplicationSlotsComputeRequiredLSN(); + updated_lsn = true; + } Instead of using pg_logical_replication_slot_advance() for each synced slot and during sync cycles what about?: - keep sync slot synchronization as it is currently (not using pg_logical_replication_slot_advance()) - create "an hidden" logical slot if sync slot feature is on - at the time of promotion use pg_logical_replication_slot_advance() on this hidden slot only to advance to the max lsn of the synced slots I'm not sure that would be enough, just asking your thoughts on this (benefits would be to avoid calling pg_logical_replication_slot_advance() on each sync slots and during the sync cycles). Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Introduce XID age and inactive timeout based replication slot invalidation
Hi, On Fri, Mar 29, 2024 at 09:39:31AM +0530, Amit Kapila wrote: > On Wed, Mar 27, 2024 at 9:00 PM Bharath Rupireddy > wrote: > > > > > > Thanks. I'm attaching v29 patches. 0001 managing inactive_since on the > > standby for sync slots. > > > > Commit message states: "why we can't just update inactive_since for > synced slots on the standby with the value received from remote slot > on the primary. This is consistent with any other slot parameter i.e. > all of them are synced from the primary." > > The inactive_since is not consistent with other slot parameters which > we copy. We don't perform anything related to those other parameters > like say two_phase phase which can change that property. However, we > do acquire the slot, advance the slot (as per recent discussion [1]), > and release it. Since these operations can impact inactive_since, it > seems to me that inactive_since is not the same as other parameters. > It can have a different value than the primary. Why would anyone want > to know the value of inactive_since from primary after the standby is > promoted? I think it can be useful "before" it is promoted and in case the primary is down. I agree that tracking the activity time of a synced slot can be useful, why not creating a dedicated field for that purpose (and keep inactive_since a perfect "copy" of the primary)? > Now, the other concern is that calling GetCurrentTimestamp() > could be costly when the values for the slot are not going to be > updated but if that happens we can optimize such that before acquiring > the slot we can have some minimal pre-checks to ensure whether we need > to update the slot or not. Right, but for a very active slot it is likely that we call GetCurrentTimestamp() during almost each sync cycle. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Synchronizing slots from primary to standby
Hi, On Thu, Mar 28, 2024 at 05:05:35PM +0530, Amit Kapila wrote: > On Thu, Mar 28, 2024 at 3:34 PM Bertrand Drouvot > wrote: > > > > On Thu, Mar 28, 2024 at 04:38:19AM +, Zhijie Hou (Fujitsu) wrote: > > > > > To fix this, we could use the fast forward logical decoding to advance > > > the synced > > > slot's lsn/xmin when syncing these values instead of directly updating the > > > slot's info. This way, the snapshot will be serialized to disk when > > > decoding. > > > If we could not reach to the consistent point at the remote restart_lsn, > > > the > > > slot is marked as temp and will be persisted once it reaches the > > > consistent > > > point. I am still analyzing the fix and will share once ready. > > > > Thanks! I'm wondering about the performance impact (even in fast_forward > > mode), > > might be worth to keep an eye on it. > > > > True, we can consider performance but correctness should be a > priority, Yeah of course. > and can we think of a better way to fix this issue? I'll keep you posted if there is one that I can think of. > > Should we create a 17 open item [1]? > > > > [1]: https://wiki.postgresql.org/wiki/PostgreSQL_17_Open_Items > > > > Yes, we can do that. done. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Synchronizing slots from primary to standby
Hi, On Thu, Mar 28, 2024 at 04:38:19AM +, Zhijie Hou (Fujitsu) wrote: > Hi, > > When analyzing one BF error[1], we find an issue of slotsync: Since we don't > perform logical decoding for the synced slots when syncing the lsn/xmin of > slot, no logical snapshots will be serialized to disk. So, when user starts to > use these synced slots after promotion, it needs to re-build the consistent > snapshot from the restart_lsn if the WAL(xl_running_xacts) at restart_lsn > position indicates that there are running transactions. This however could > cause the data that before the consistent point to be missed[2]. I see, nice catch and explanation, thanks! > This issue doesn't exist on the primary because the snapshot at restart_lsn > should have been serialized to disk (SnapBuildProcessRunningXacts -> > SnapBuildSerialize), so even if the logical decoding restarts, it can find > consistent snapshot immediately at restart_lsn. Right. > To fix this, we could use the fast forward logical decoding to advance the > synced > slot's lsn/xmin when syncing these values instead of directly updating the > slot's info. This way, the snapshot will be serialized to disk when decoding. > If we could not reach to the consistent point at the remote restart_lsn, the > slot is marked as temp and will be persisted once it reaches the consistent > point. I am still analyzing the fix and will share once ready. Thanks! I'm wondering about the performance impact (even in fast_forward mode), might be worth to keep an eye on it. Should we create a 17 open item [1]? [1]: https://wiki.postgresql.org/wiki/PostgreSQL_17_Open_Items Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Introduce XID age and inactive timeout based replication slot invalidation
h(slot, path, ERROR); + } Maybe we could create a new function say GivenReplicationSlotSave() with a slot as parameter, that ReplicationSlotSave() could call too? CR9 === + if (check_for_invalidation) + { + /* The slot is ours by now */ + Assert(s->active_pid == MyProcPid); + + /* +* Well, the slot is not yet ours really unless we check for the +* invalidation below. +*/ + s->active_pid = 0; + if (InvalidateReplicationSlotForInactiveTimeout(s, true, true)) + { + /* +* If the slot has been invalidated, recalculate the resource +* limits. +*/ + ReplicationSlotsComputeRequiredXmin(false); + ReplicationSlotsComputeRequiredLSN(); + + /* Might need it for slot clean up on error, so restore it */ + s->active_pid = MyProcPid; + ereport(ERROR, + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), +errmsg("cannot acquire invalidated replication slot \"%s\"", + NameStr(MyReplicationSlot->data.name; + } + s->active_pid = MyProcPid; Are we not missing some SpinLockAcquire/Release on the slot's mutex here? (the places where we set the active_pid). CR10 === @@ -1628,6 +1674,10 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlotInvalidationCause cause, if (SlotIsLogical(s)) invalidation_cause = cause; break; + case RS_INVAL_INACTIVE_TIMEOUT: + if (InvalidateReplicationSlotForInactiveTimeout(s, false, false)) + invalidation_cause = cause; + break; InvalidatePossiblyObsoleteSlot() is not called with such a reason, better to use an Assert here and in the caller too? CR11 === +++ b/src/test/recovery/t/050_invalidate_slots.pl why not using 019_replslot_limit.pl? Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Introduce XID age and inactive timeout based replication slot invalidation
Hi, On Wed, Mar 27, 2024 at 09:00:37PM +0530, Bharath Rupireddy wrote: > On Wed, Mar 27, 2024 at 6:54 PM Bertrand Drouvot > wrote: > > > > Hi, > > > > On Wed, Mar 27, 2024 at 05:55:05PM +0530, Bharath Rupireddy wrote: > > > On Wed, Mar 27, 2024 at 3:42 PM Bertrand Drouvot > > > Please see the attached v28 patch. > > > > Thanks! > > > > 1 === sorry I missed it in the previous review > > > > if (!(RecoveryInProgress() && slot->data.synced)) > > + { > > now = GetCurrentTimestamp(); > > + update_inactive_since = true; > > + } > > + else > > + update_inactive_since = false; > > > > I think update_inactive_since is not needed, we could rely on (now > 0) > > instead. > > Thought of using it, but, at the expense of readability. I prefer to > use a variable instead. That's fine too. > However, I changed the variable to be more meaningful to is_slot_being_synced. Yeah makes sense and even easier to read. v29-0001 LGTM. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Introduce XID age and inactive timeout based replication slot invalidation
Hi, On Wed, Mar 27, 2024 at 05:55:05PM +0530, Bharath Rupireddy wrote: > On Wed, Mar 27, 2024 at 3:42 PM Bertrand Drouvot > Please see the attached v28 patch. Thanks! 1 === sorry I missed it in the previous review if (!(RecoveryInProgress() && slot->data.synced)) + { now = GetCurrentTimestamp(); + update_inactive_since = true; + } + else + update_inactive_since = false; I think update_inactive_since is not needed, we could rely on (now > 0) instead. 2 === +=item $node->get_slot_inactive_since_value(self, primary, slot_name, dbname) + +Get inactive_since column value for a given replication slot validating it +against optional reference time. + +=cut + +sub get_slot_inactive_since_value +{ shouldn't be "=item $node->get_slot_inactive_since_value(self, slot_name, reference_time)" instead? Apart from the above, LGTM. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Introduce XID age and inactive timeout based replication slot invalidation
Hi, On Wed, Mar 27, 2024 at 02:55:17PM +0530, Bharath Rupireddy wrote: > Please check the attached v27 patch which also has Bertrand's comment > on deduplicating the TAP function. I've now moved it to Cluster.pm. Thanks! 1 === +Note that the slots on the standbys that are being synced from a +primary server (whose synced field is +true), will get the +inactive_since value from the +corresponding remote slot on the primary. Also, note that for the +synced slots on the standby, after the standby starts up (i.e. after +the slots are loaded from the disk), the inactive_since will remain +zero until the next slot sync cycle. Not sure we should mention the "(i.e. after the slots are loaded from the disk)" and also "cycle" (as that does not sound right in case of manual sync). My proposal (in text) but feel free to reword it: Note that the slots on the standbys that are being synced from a primary server (whose synced field is true), will get the inactive_since value from the corresponding remote slot on the primary. Also, after the standby starts up, the inactive_since (for such synced slots) will remain zero until the next synchronization. 2 === +=item $node->create_logical_slot_on_standby(self, primary, slot_name, dbname) get_slot_inactive_since_value instead? 3 === +against given reference time. s/given reference/optional given reference/? Apart from the above, LGTM. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: pgsql: Track last_inactive_time in pg_replication_slots.
Hi, On Wed, Mar 27, 2024 at 12:28:06PM +0530, Bharath Rupireddy wrote: > On Wed, Mar 27, 2024 at 10:10 AM Amit Kapila wrote: > > > > On Tue, Mar 26, 2024 at 9:10 PM Alvaro Herrera > > wrote: > > > > > > On 2024-Mar-26, Nathan Bossart wrote: > > > > > > > FWIW I'd really prefer to have something like max_slot_xid_age for > > > > this. A > > > > time-based parameter would likely help with most cases, but transaction > > > > ID > > > > usage will vary widely from server to server, so it'd be nice to have > > > > something to protect against wraparound more directly. > > > > > > Yeah, I tend to agree that an XID-based limit makes more sense than a > > > time-based one. > > > > > So, do we want the time-based parameter or just max_slot_xid_age > > considering both will be GUC's? Yes, the xid_age based parameter > > sounds to be directly helpful for transaction ID wraparound or dead > > row cleanup, OTOH having a lot of inactive slots (which is possible in > > use cases where a tool creates a lot of slots and forgets to remove > > them, or the tool exits without cleaning up slots (say due to server > > shutdown)) also prohibit removing dead space which is not nice either? > > I've personally seen the leftover slots problem on production systems > where a timeout based invalidation mechanism could have been of more > help to save time and reduce manual intervention. Usually, most if not > all, migration/upgrade/other tools create slots, and if at all any > errors occur or the operation gets cancelled midway, there's a chance > that the slots can be leftover if such tools forgets to clean them up > either because there was a bug or for whatever reasons. These are > unintended/ghost slots for the database user unnecessarily holding up > resources such as XIDs, dead rows and WAL; which might lead to XID > wraparound or server crash if unnoticed. Although XID based GUC helps > a lot, why do these unintended and unnoticed slots need to hold up the > resources even before the XID age of say 1.5 or 2 billion is reached. > > With both GUCs max_slot_xid_age and replication_slot_inactive_timeout > in place, I can set max_slot_xid_age = 1.5 billion or so and also set > replication_slot_inactive_timeout = 2 days or so to make the database > foolproof. Yeah, I think that both makes senses. The reason is that one depends of the database activity and slot activity (the xid age one) while the other (the timeout one) depends only of the slot activity. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Introduce XID age and inactive timeout based replication slot invalidation
Hi, On Wed, Mar 27, 2024 at 10:08:33AM +0530, Bharath Rupireddy wrote: > On Tue, Mar 26, 2024 at 11:22 PM Bertrand Drouvot > wrote: > > > > - if (!(RecoveryInProgress() && slot->data.synced)) > > + if (!(InRecovery && slot->data.synced)) > > slot->inactive_since = GetCurrentTimestamp(); > > else > > slot->inactive_since = 0; > > > > Not related to this change but more the way RestoreSlotFromDisk() behaves > > here: > > > > For a sync slot on standby it will be set to zero and then later will be > > synchronized with the one coming from the primary. I think that's fine to > > have > > it to zero for this window of time. > > Right. > > > Now, if the standby is down and one sets sync_replication_slots to off, > > then inactive_since will be set to zero on the standby at startup and not > > synchronized (unless one triggers a manual sync). I also think that's fine > > but > > it might be worth to document this behavior (that after a standby startup > > inactive_since is zero until the next sync...). > > Isn't this behaviour applicable for other slot parameters that the > slot syncs from the remote slot on the primary? No they are persisted on disk. If not, we'd not know where to resume the decoding from on the standby in case primary is down and/or sync is off. > I've added the following note in the comments when we update > inactive_since in RestoreSlotFromDisk. > > * Note that for synced slots after the standby starts up (i.e. after > * the slots are loaded from the disk), the inactive_since will remain > * zero until the next slot sync cycle. > */ > if (!(InRecovery && slot->data.synced)) > slot->inactive_since = GetCurrentTimestamp(); > else > slot->inactive_since = 0; I think we should add some words in the doc too and also about what the meaning of inactive_since on the standby is (as suggested by Shveta in [1]). [1]: https://www.postgresql.org/message-id/CAJpy0uDkTW%2Bt1k3oPkaipFBzZePfFNB5DmiA%3D%3DpxRGcAdpF%3DPg%40mail.gmail.com > > 7 === > > > > +# Capture and validate inactive_since of a given slot. > > +sub capture_and_validate_slot_inactive_since > > +{ > > + my ($node, $slot_name, $slot_creation_time) = @_; > > + my $name = $node->name; > > > > We know have capture_and_validate_slot_inactive_since at 2 places: > > 040_standby_failover_slots_sync.pl and 019_replslot_limit.pl. > > > > Worth to create a sub in Cluster.pm? > > I'd second that thought for now. We might have to debate first if it's > useful for all the nodes even without replication, and if yes, the > naming stuff and all that. Historically, we've had such duplicated > functions until recently, for instance advance_wal and log_contains. > We > moved them over to a common perl library Cluster.pm very recently. I'm > sure we can come back later to move it to Cluster.pm. I thought that would be the right time not to introduce duplicated code. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Introduce XID age and inactive timeout based replication slot invalidation
Hi, On Tue, Mar 26, 2024 at 09:59:23PM +0530, Bharath Rupireddy wrote: > On Tue, Mar 26, 2024 at 4:35 PM Bharath Rupireddy > wrote: > > > > If we just sync inactive_since value for synced slots while in > > recovery from the primary, so be it. Why do we need to update it to > > the current time when the slot is being created? We don't expose slot > > creation time, no? Aren't we fine if we just sync the value from > > primary and document that fact? After the promotion, we can reset it > > to the current time so that it gets its own time. > > I'm attaching v24 patches. It implements the above idea proposed > upthread for synced slots. I've now separated > s/last_inactive_time/inactive_since and synced slots behaviour. Please > have a look. Thanks! v24-0001 It's now pure mechanical changes and it looks good to me. v24-0002 1 === This commit does two things: 1) Updates inactive_since for sync slots with the value received from the primary's slot. Tested it and it does that. 2 === 2) Ensures the value is set to current timestamp during the shutdown of slot sync machinery to help correctly interpret the time if the standby gets promoted without a restart. Tested it and it does that. 3 === +/* + * Reset the synced slots info such as inactive_since after shutting + * down the slot sync machinery. + */ +static void +update_synced_slots_inactive_time(void) Looks like the comment "reset" is not matching the name of the function and what it does. 4 === + /* +* We get the current time beforehand and only once to avoid +* system calls overhead while holding the lock. +*/ + if (now == 0) + now = GetCurrentTimestamp(); Also +1 of having GetCurrentTimestamp() just called one time within the loop. 5 === - if (!(RecoveryInProgress() && slot->data.synced)) + if (!(InRecovery && slot->data.synced)) slot->inactive_since = GetCurrentTimestamp(); else slot->inactive_since = 0; Not related to this change but more the way RestoreSlotFromDisk() behaves here: For a sync slot on standby it will be set to zero and then later will be synchronized with the one coming from the primary. I think that's fine to have it to zero for this window of time. Now, if the standby is down and one sets sync_replication_slots to off, then inactive_since will be set to zero on the standby at startup and not synchronized (unless one triggers a manual sync). I also think that's fine but it might be worth to document this behavior (that after a standby startup inactive_since is zero until the next sync...). 6 === + print "HI $slot_name $name $inactive_since $slot_creation_time\n"; garbage? 7 === +# Capture and validate inactive_since of a given slot. +sub capture_and_validate_slot_inactive_since +{ + my ($node, $slot_name, $slot_creation_time) = @_; + my $name = $node->name; We know have capture_and_validate_slot_inactive_since at 2 places: 040_standby_failover_slots_sync.pl and 019_replslot_limit.pl. Worth to create a sub in Cluster.pm? Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: pgsql: Track last_inactive_time in pg_replication_slots.
Hi, On Tue, Mar 26, 2024 at 04:39:55PM +0100, Alvaro Herrera wrote: > On 2024-Mar-26, Nathan Bossart wrote: > > I don't object to a > > time-based setting as well, but that won't always work as well for this > > particular use-case, especially if we are relying on users to set a > > slot-level parameter. > > I think slot-level parameters are mostly useless, because it takes just > one slot where you forget to set it for disaster to strike. I think that's a fair point. So maybe we should focus on having a GUC first and later on re-think about having (or not) a slot based one (in addition to the GUC). Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Introduce XID age and inactive timeout based replication slot invalidation
Hi, On Tue, Mar 26, 2024 at 04:17:53PM +0530, shveta malik wrote: > On Tue, Mar 26, 2024 at 3:50 PM Bertrand Drouvot > wrote: > > > > Hi, > > > > > I think there may have been some misunderstanding here. > > > > Indeed ;-) > > > > > But now if I > > > rethink this, I am fine with 'inactive_since' getting synced from > > > primary to standby. But if we do that, we need to add docs stating > > > "inactive_since" represents primary's inactivity and not standby's > > > slots inactivity for synced slots. > > > > Yeah sure. > > > > > The reason for this clarification > > > is that the synced slot might be generated much later, yet > > > 'inactive_since' is synced from the primary, potentially indicating a > > > time considerably earlier than when the synced slot was actually > > > created. > > > > Right. > > > > > Another approach could be that "inactive_since" for synced slot > > > actually gives its own inactivity data rather than giving primary's > > > slot data. We update inactive_since on standby only at 3 occasions: > > > 1) at the time of creation of the synced slot. > > > 2) during standby restart. > > > 3) during promotion of standby. > > > > > > I have attached a sample patch for this idea as.txt file. > > > > Thanks! > > > > > I am fine with any of these approaches. One gives data synced from > > > primary for synced slots, while another gives actual inactivity data > > > of synced slots. > > > > What about another approach?: inactive_since gives data synced from primary > > for > > synced slots and another dedicated field (could be added later...) could > > represent what you suggest as the other option. > > Yes, okay with me. I think there is some confusion here as well. In my > second approach above, I have not suggested anything related to > sync-worker. Yeah, no confusion, understood that way. > We can think on that later if we really need another > field which give us sync time. I think that calling GetCurrentTimestamp() so frequently could be too costly, so I'm not sure we should. > In my second approach, I have tried to > avoid updating inactive_since for synced slots during sync process. We > update that field during creation of synced slot so that > inactive_since reflects correct info even for synced slots (rather > than copying from primary). Yeah, and I think we could create a dedicated field with this information if we feel the need. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Introduce XID age and inactive timeout based replication slot invalidation
Hi, On Tue, Mar 26, 2024 at 04:49:18PM +0530, shveta malik wrote: > On Tue, Mar 26, 2024 at 4:35 PM Bharath Rupireddy > wrote: > > > > On Tue, Mar 26, 2024 at 4:18 PM shveta malik wrote: > > > > > > > What about another approach?: inactive_since gives data synced from > > > > primary for > > > > synced slots and another dedicated field (could be added later...) could > > > > represent what you suggest as the other option. > > > > > > Yes, okay with me. I think there is some confusion here as well. In my > > > second approach above, I have not suggested anything related to > > > sync-worker. We can think on that later if we really need another > > > field which give us sync time. In my second approach, I have tried to > > > avoid updating inactive_since for synced slots during sync process. We > > > update that field during creation of synced slot so that > > > inactive_since reflects correct info even for synced slots (rather > > > than copying from primary). Please have a look at my patch and let me > > > know your thoughts. I am fine with copying it from primary as well and > > > documenting this behaviour. > > > > I took a look at your patch. > > > > --- a/src/backend/replication/logical/slotsync.c > > +++ b/src/backend/replication/logical/slotsync.c > > @@ -628,6 +628,7 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid > > remote_dbid) > > SpinLockAcquire(>mutex); > > slot->effective_catalog_xmin = xmin_horizon; > > slot->data.catalog_xmin = xmin_horizon; > > +slot->inactive_since = GetCurrentTimestamp(); > > SpinLockRelease(>mutex); > > > > If we just sync inactive_since value for synced slots while in > > recovery from the primary, so be it. Why do we need to update it to > > the current time when the slot is being created? > > If we update inactive_since at synced slot's creation or during > restart (skipping setting it during sync), then this time reflects > actual 'inactive_since' for that particular synced slot. Isn't that a > clear info for the user and in alignment of what the name > 'inactive_since' actually suggests? > > > We don't expose slot > > creation time, no? > > No, we don't. But for synced slot, that is the time since that slot is > inactive (unless promoted), so we are exposing inactive_since and not > creation time. > > >Aren't we fine if we just sync the value from > > primary and document that fact? After the promotion, we can reset it > > to the current time so that it gets its own time. Do you see any > > issues with it? > > Yes, we can do that. But curious to know, do we see any additional > benefit of reflecting primary's inactive_since at standby which I > might be missing? In case the primary goes down, then one could use the value on the standby to get the value coming from the primary. I think that could be useful info to have. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Introduce XID age and inactive timeout based replication slot invalidation
Hi, On Tue, Mar 26, 2024 at 03:17:36PM +0530, shveta malik wrote: > On Tue, Mar 26, 2024 at 1:54 PM Bertrand Drouvot > wrote: > > > > Hi, > > > > On Tue, Mar 26, 2024 at 01:37:21PM +0530, Amit Kapila wrote: > > > On Tue, Mar 26, 2024 at 1:15 PM Bertrand Drouvot > > > wrote: > > > > > > > > 2 === > > > > > > > > It looks like inactive_since is set to the current timestamp on the > > > > standby > > > > each time the sync worker does a cycle: > > > > > > > > primary: > > > > > > > > postgres=# select slot_name,inactive_since from pg_replication_slots > > > > where failover = 't'; > > > > slot_name |inactive_since > > > > -+--- > > > > lsub27_slot | 2024-03-26 07:39:19.745517+00 > > > > lsub28_slot | 2024-03-26 07:40:24.953826+00 > > > > > > > > standby: > > > > > > > > postgres=# select slot_name,inactive_since from pg_replication_slots > > > > where failover = 't'; > > > > slot_name |inactive_since > > > > -+--- > > > > lsub27_slot | 2024-03-26 07:43:56.387324+00 > > > > lsub28_slot | 2024-03-26 07:43:56.387338+00 > > > > > > > > I don't think that should be the case. > > > > > > > > > > But why? This is exactly what we discussed in another thread where we > > > agreed to update inactive_since even for sync slots. > > > > Hum, I thought we agreed to "sync" it and to "update it to current time" > > only at promotion time. > > I think there may have been some misunderstanding here. Indeed ;-) > But now if I > rethink this, I am fine with 'inactive_since' getting synced from > primary to standby. But if we do that, we need to add docs stating > "inactive_since" represents primary's inactivity and not standby's > slots inactivity for synced slots. Yeah sure. > The reason for this clarification > is that the synced slot might be generated much later, yet > 'inactive_since' is synced from the primary, potentially indicating a > time considerably earlier than when the synced slot was actually > created. Right. > Another approach could be that "inactive_since" for synced slot > actually gives its own inactivity data rather than giving primary's > slot data. We update inactive_since on standby only at 3 occasions: > 1) at the time of creation of the synced slot. > 2) during standby restart. > 3) during promotion of standby. > > I have attached a sample patch for this idea as.txt file. Thanks! > I am fine with any of these approaches. One gives data synced from > primary for synced slots, while another gives actual inactivity data > of synced slots. What about another approach?: inactive_since gives data synced from primary for synced slots and another dedicated field (could be added later...) could represent what you suggest as the other option. Another cons of updating inactive_since at the current time during each slot sync cycle is that calling GetCurrentTimestamp() very frequently (during each sync cycle of very active slots) could be too costly. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Introduce XID age and inactive timeout based replication slot invalidation
Hi, On Tue, Mar 26, 2024 at 02:27:17PM +0530, Bharath Rupireddy wrote: > Please use the v22 patch set. Thanks! 1 === +reset_synced_slots_info(void) I'm not sure "reset" is the right word, what about slot_sync_shutdown_update()? 2 === + for (int i = 0; i < max_replication_slots; i++) + { + ReplicationSlot *s = >replication_slots[i]; + + /* Check if it is a synchronized slot */ + if (s->in_use && s->data.synced) + { + TimestampTz now; + + Assert(SlotIsLogical(s)); + Assert(s->active_pid == 0); + + /* +* Set the time since the slot has become inactive after shutting +* down slot sync machinery. This helps correctly interpret the +* time if the standby gets promoted without a restart. We get the +* current time beforehand to avoid a system call while holding +* the lock. +*/ + now = GetCurrentTimestamp(); What about moving "now = GetCurrentTimestamp()" outside of the for loop? (it would be less costly and probably good enough). Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: pgsql: Track last_inactive_time in pg_replication_slots.
Hi, On Tue, Mar 26, 2024 at 01:45:23PM +0530, Amit Kapila wrote: > On Tue, Mar 26, 2024 at 1:09 PM Alvaro Herrera > wrote: > > > > On 2024-Mar-26, Amit Kapila wrote: > > > > > We have a consensus on inactive_since, so I'll make that change. > > > > Sounds reasonable. So this is a timestamptz if the slot is inactive, > > NULL if active, right? > > > > Yes. > > > What value is it going to have for sync slots? > > > > The behavior will be the same for non-sync slots. In each sync cycle, > we acquire/release the sync slots. So at the time of release, > inactive_since will be updated. See email [1]. I don't think we should set inactive_since to the current time at each sync cycle, see [1] as to why. What do you think? [1]: https://www.postgresql.org/message-id/ZgKGIDC5lttWTdJH%40ip-10-97-1-34.eu-west-3.compute.internal Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Introduce XID age and inactive timeout based replication slot invalidation
Hi, On Tue, Mar 26, 2024 at 01:37:21PM +0530, Amit Kapila wrote: > On Tue, Mar 26, 2024 at 1:15 PM Bertrand Drouvot > wrote: > > > > 2 === > > > > It looks like inactive_since is set to the current timestamp on the standby > > each time the sync worker does a cycle: > > > > primary: > > > > postgres=# select slot_name,inactive_since from pg_replication_slots where > > failover = 't'; > > slot_name |inactive_since > > -+--- > > lsub27_slot | 2024-03-26 07:39:19.745517+00 > > lsub28_slot | 2024-03-26 07:40:24.953826+00 > > > > standby: > > > > postgres=# select slot_name,inactive_since from pg_replication_slots where > > failover = 't'; > > slot_name |inactive_since > > -+--- > > lsub27_slot | 2024-03-26 07:43:56.387324+00 > > lsub28_slot | 2024-03-26 07:43:56.387338+00 > > > > I don't think that should be the case. > > > > But why? This is exactly what we discussed in another thread where we > agreed to update inactive_since even for sync slots. Hum, I thought we agreed to "sync" it and to "update it to current time" only at promotion time. I don't think updating inactive_since to current time during each cycle makes sense (I mean I understand the use case: being able to say when slots have been sync, but if this is what we want then we should consider an extra view or an extra field but not relying on the inactive_since one). If the primary goes down, not updating inactive_since to the current time could also provide benefit such as knowing the inactive_since of the primary slots (from the standby) the last time it has been synced. If we update it to the current time then this information is lost. > In each sync > cycle, we acquire/release the slot, so the inactive_since gets > updated. See synchronize_one_slot(). Right, and I think we should put an extra condition if in recovery. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Introduce XID age and inactive timeout based replication slot invalidation
Hi, On Tue, Mar 26, 2024 at 11:07:51AM +0530, Bharath Rupireddy wrote: > On Tue, Mar 26, 2024 at 9:30 AM shveta malik wrote: > > But immediately after promotion, we can not rely on the above check > > and thus possibility of synced slots invalidation is there. To > > maintain consistent behavior regarding the setting of > > last_inactive_time for synced slots, similar to user slots, one > > potential solution to prevent this invalidation issue is to update the > > last_inactive_time of all synced slots within the ShutDownSlotSync() > > function during FinishWalRecovery(). This approach ensures that > > promotion doesn't immediately invalidate slots, and henceforth, we > > possess a correct last_inactive_time as a basis for invalidation going > > forward. This will be equivalent to updating last_inactive_time during > > restart (but without actual restart during promotion). > > The plus point of maintaining last_inactive_time for synced slots > > could be, this can provide data to the user on when last time the sync > > was attempted on that particular slot by background slot sync worker > > or SQl function. Thoughts? > > Please find the attached v21 patch implementing the above idea. It > also has changes for renaming last_inactive_time to inactive_since. Thanks! A few comments: 1 === One trailing whitespace: Applying: Fix review comments for slot's last_inactive_time property .git/rebase-apply/patch:433: trailing whitespace. # got a valid inactive_since value representing the last slot sync time. warning: 1 line adds whitespace errors. 2 === It looks like inactive_since is set to the current timestamp on the standby each time the sync worker does a cycle: primary: postgres=# select slot_name,inactive_since from pg_replication_slots where failover = 't'; slot_name |inactive_since -+--- lsub27_slot | 2024-03-26 07:39:19.745517+00 lsub28_slot | 2024-03-26 07:40:24.953826+00 standby: postgres=# select slot_name,inactive_since from pg_replication_slots where failover = 't'; slot_name |inactive_since -+--- lsub27_slot | 2024-03-26 07:43:56.387324+00 lsub28_slot | 2024-03-26 07:43:56.387338+00 I don't think that should be the case. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Introduce XID age and inactive timeout based replication slot invalidation
Hi, On Tue, Mar 26, 2024 at 05:55:11AM +, Bertrand Drouvot wrote: > Hi, > > On Tue, Mar 26, 2024 at 09:30:32AM +0530, shveta malik wrote: > > On Mon, Mar 25, 2024 at 12:43 PM shveta malik > > wrote: > > > > > > I have one concern, for synced slots on standby, how do we disallow > > > invalidation due to inactive-timeout immediately after promotion? > > > > > > For synced slots, last_inactive_time and inactive_timeout are both > > > set. Let's say I bring down primary for promotion of standby and then > > > promote standby, there are chances that it may end up invalidating > > > synced slots (considering standby is not brought down during promotion > > > and thus inactive_timeout may already be past 'last_inactive_time'). > > > > > > > On standby, if we decide to maintain valid last_inactive_time for > > synced slots, then invalidation is correctly restricted in > > InvalidateSlotForInactiveTimeout() for synced slots using the check: > > > > if (RecoveryInProgress() && slot->data.synced) > > return false; > > Right. > > > But immediately after promotion, we can not rely on the above check > > and thus possibility of synced slots invalidation is there. To > > maintain consistent behavior regarding the setting of > > last_inactive_time for synced slots, similar to user slots, one > > potential solution to prevent this invalidation issue is to update the > > last_inactive_time of all synced slots within the ShutDownSlotSync() > > function during FinishWalRecovery(). This approach ensures that > > promotion doesn't immediately invalidate slots, and henceforth, we > > possess a correct last_inactive_time as a basis for invalidation going > > forward. This will be equivalent to updating last_inactive_time during > > restart (but without actual restart during promotion). > > The plus point of maintaining last_inactive_time for synced slots > > could be, this can provide data to the user on when last time the sync > > was attempted on that particular slot by background slot sync worker > > or SQl function. Thoughts? > > Yeah, another plus point is that if the primary is down then one could look > at the synced "active_since" on the standby to get an idea of it (depends of > the > last sync though). > > The issue that I can see with your proposal is: what if one synced the slots > manually (with pg_sync_replication_slots()) but does not use the sync worker? > Then I think ShutDownSlotSync() is not going to help in that case. It looks like ShutDownSlotSync() is always called (even if sync_replication_slots = off), so that sounds ok to me (I should have checked the code, I was under the impression ShutDownSlotSync() was not called if sync_replication_slots = off). Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Introduce XID age and inactive timeout based replication slot invalidation
Hi, On Tue, Mar 26, 2024 at 09:30:32AM +0530, shveta malik wrote: > On Mon, Mar 25, 2024 at 12:43 PM shveta malik wrote: > > > > I have one concern, for synced slots on standby, how do we disallow > > invalidation due to inactive-timeout immediately after promotion? > > > > For synced slots, last_inactive_time and inactive_timeout are both > > set. Let's say I bring down primary for promotion of standby and then > > promote standby, there are chances that it may end up invalidating > > synced slots (considering standby is not brought down during promotion > > and thus inactive_timeout may already be past 'last_inactive_time'). > > > > On standby, if we decide to maintain valid last_inactive_time for > synced slots, then invalidation is correctly restricted in > InvalidateSlotForInactiveTimeout() for synced slots using the check: > > if (RecoveryInProgress() && slot->data.synced) > return false; Right. > But immediately after promotion, we can not rely on the above check > and thus possibility of synced slots invalidation is there. To > maintain consistent behavior regarding the setting of > last_inactive_time for synced slots, similar to user slots, one > potential solution to prevent this invalidation issue is to update the > last_inactive_time of all synced slots within the ShutDownSlotSync() > function during FinishWalRecovery(). This approach ensures that > promotion doesn't immediately invalidate slots, and henceforth, we > possess a correct last_inactive_time as a basis for invalidation going > forward. This will be equivalent to updating last_inactive_time during > restart (but without actual restart during promotion). > The plus point of maintaining last_inactive_time for synced slots > could be, this can provide data to the user on when last time the sync > was attempted on that particular slot by background slot sync worker > or SQl function. Thoughts? Yeah, another plus point is that if the primary is down then one could look at the synced "active_since" on the standby to get an idea of it (depends of the last sync though). The issue that I can see with your proposal is: what if one synced the slots manually (with pg_sync_replication_slots()) but does not use the sync worker? Then I think ShutDownSlotSync() is not going to help in that case. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: pgsql: Track last_inactive_time in pg_replication_slots.
Hi, On Mon, Mar 25, 2024 at 12:25:37PM -0400, Robert Haas wrote: > On Mon, Mar 25, 2024 at 12:12 PM Bertrand Drouvot > wrote: > > Would "released_time" sounds better? (at the end this is exactly what it > > does > > represent unless for the case where it is restored from disk for which the > > meaning > > would still makes sense to me though). It seems to me that released_time > > does not > > lead to any expectation then removing any confusion. > > Yeah, that's not bad. I mean, I don't agree that released_time doesn't > lead to any expectation, > but what it leads me to expect is that you're > going to tell me the time at which the slot was released. So if it's > currently active, then I see NULL, because it's not released; but if > it's inactive, then I see the time at which it became so. > > In the same vein, I think deactivated_at or inactive_since might be > good names to consider. I think they get at the same thing as > released_time, but they avoid introducing a completely new word > (release, as opposed to active/inactive). > Yeah, I'd vote for inactive_since then. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: pgsql: Track last_inactive_time in pg_replication_slots.
Hi, On Mon, Mar 25, 2024 at 07:32:11PM +0530, Amit Kapila wrote: > On Mon, Mar 25, 2024 at 6:57 PM Robert Haas wrote: > > And I'm suspicious that having an exception for slots being synced is > > a bad idea. That makes too much of a judgement about how the user will > > use this field. It's usually better to just expose the data, and if > > the user needs helps to make sense of that data, then give them that > > help separately. > > The reason we didn't set this for sync slots is that they won't be > usable (one can't use them to decode WAL) unless standby is promoted > [2]. But I see your point as well. So, I have copied the others > involved in this discussion to see what they think. Yeah I also see Robert's point. If we also sync the "last inactive time" field then we would need to take care of the corner case mentioned by Shveta in [1] during promotion. [1]: https://www.postgresql.org/message-id/CAJpy0uCLu%2BmqAwAMum%3DpXE9YYsy0BE7hOSw_Wno5vjwpFY%3D63g%40mail.gmail.com Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: pgsql: Track last_inactive_time in pg_replication_slots.
Hi, On Mon, Mar 25, 2024 at 11:49:00AM -0400, Robert Haas wrote: > On Mon, Mar 25, 2024 at 11:16 AM Bertrand Drouvot > wrote: > > > IIUC, Bertrand's point was that users can interpret last_active_time > > > as a value that gets updated each time they decode a change which is > > > not what we are doing. So, this can confuse users. Your expectation of > > > answer (NULL) when the slot is active is correct and that is what will > > > happen. > > > > Yeah, and so would be the confusion: why is last_active_time NULL while one > > is > > using the slot? > > I agree that users could get confused here, but the solution to that > shouldn't be to give the field a name that is the opposite of what it > actually does. I expect a field called last_inactive_time to tell me > the last time that the slot was inactive. Here, it tells me the last > time that a currently-inactive slot previously *WAS* active. How can > you justify calling that the last *INACTIVE* time? > > AFAICS, the user who has the confusion that you mention here is simply > wrong. If they are looking at a field called "last active time" and > the slot is active, then the correct answer is "right now" or > "undefined" and that is what they will see. Sure, they might not > understand that. But flipping the name of the field on its head cannot > be the right way to help them. > > With the current naming, I expect to have the exact opposite confusion > as your hypothetical confused user. I'm going to be looking at a slot > that's currently inactive, and it's going to tell me that the > last_inactive_time was at some time in the past. And I'm going to say > "what the heck is going on here, the slot is inactive *right now*!" > > Half of me wonders whether we should avoid this whole problem by > renaming it to something like last_state_change or > last_state_change_time, or maybe just state_change like we do in > pg_stat_activity, and making it mean the last time the slot flipped > between active and inactive in either direction. I'm not sure if this > is better, but unless I'm misunderstanding something, the current > situation is terrible. > Now that I read your arguments I think that last__time could be both missleading because at the end they rely on users "expectation". Would "released_time" sounds better? (at the end this is exactly what it does represent unless for the case where it is restored from disk for which the meaning would still makes sense to me though). It seems to me that released_time does not lead to any expectation then removing any confusion. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: pgsql: Track last_inactive_time in pg_replication_slots.
Hi, On Mon, Mar 25, 2024 at 08:59:55PM +0530, Amit Kapila wrote: > On Mon, Mar 25, 2024 at 8:46 PM Bertrand Drouvot > wrote: > > > > On Mon, Mar 25, 2024 at 08:38:16PM +0530, Amit Kapila wrote: > > > On Mon, Mar 25, 2024 at 7:51 PM Robert Haas wrote: > > > > > > > > On Mon, Mar 25, 2024 at 10:02 AM Amit Kapila > > > > wrote: > > > > > We considered the other two names as last_inactive_at and > > > > > last_active_time. For the first (last_inactive_at), there was an > > > > > argument that most other fields that display time ends with _time. For > > > > > the second (last_active_time), there was an argument that it could be > > > > > misleading as one could think that it should be updated each time WAL > > > > > record decoding is happening [1]. The other possibility is to name it > > > > > last_used_time but I think it won't be much different from > > > > > last_active_time. > > > > > > > > I don't understand the bit about updating it each time WAL record > > > > decoding is happening. If it's the last active time, and the slot is > > > > currently active, then the answer is either "right now" or "currently > > > > undefined." I'd expect to see NULL in the system view in such a case. > > > > And if that's so, then there's nothing to update each time a record is > > > > decoded, because it's just still going to show NULL. > > > > > > > > > > IIUC, Bertrand's point was that users can interpret last_active_time > > > as a value that gets updated each time they decode a change which is > > > not what we are doing. So, this can confuse users. Your expectation of > > > answer (NULL) when the slot is active is correct and that is what will > > > happen. > > > > Yeah, and so would be the confusion: why is last_active_time NULL while one > > is > > using the slot? > > > > It is because we set it to zero when we acquire the slot and that > value will remain the same till the slot is active. I am not sure if I > understood your question so what I am saying might not make sense. There is no "real" question, I was just highlighting the confusion in case we name the field "last_active_time". Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: pgsql: Track last_inactive_time in pg_replication_slots.
Hi, On Mon, Mar 25, 2024 at 08:38:16PM +0530, Amit Kapila wrote: > On Mon, Mar 25, 2024 at 7:51 PM Robert Haas wrote: > > > > On Mon, Mar 25, 2024 at 10:02 AM Amit Kapila > > wrote: > > > We considered the other two names as last_inactive_at and > > > last_active_time. For the first (last_inactive_at), there was an > > > argument that most other fields that display time ends with _time. For > > > the second (last_active_time), there was an argument that it could be > > > misleading as one could think that it should be updated each time WAL > > > record decoding is happening [1]. The other possibility is to name it > > > last_used_time but I think it won't be much different from > > > last_active_time. > > > > I don't understand the bit about updating it each time WAL record > > decoding is happening. If it's the last active time, and the slot is > > currently active, then the answer is either "right now" or "currently > > undefined." I'd expect to see NULL in the system view in such a case. > > And if that's so, then there's nothing to update each time a record is > > decoded, because it's just still going to show NULL. > > > > IIUC, Bertrand's point was that users can interpret last_active_time > as a value that gets updated each time they decode a change which is > not what we are doing. So, this can confuse users. Your expectation of > answer (NULL) when the slot is active is correct and that is what will > happen. Yeah, and so would be the confusion: why is last_active_time NULL while one is using the slot? > > Why does this field get set to the current time when the slot is > > restored from disk? > > > > It is because we don't want to include the time the server is down in > the last_inactive_time. Say, if we are shutting down the server at > time X and the server remains down for another two hours, we don't > want to include those two hours as the slot inactive time. The related > theory is that this field will be used to invalidate inactive slots > based on a threshold (say inactive_timeout). Say, before the shutdown, > we release the slot and set the current_time for last_inactive_time > for each slot and persist that information as well. Now, if the server > is down for a long time, we may invalidate the slots as soon as the > server comes up. So, instead, we just set this field at the time we > read slots for disk and then reset it to 0/NULL as soon as the slot > became active. Right, and we also want to invalidate the slot if not used duration > timeout, so that setting the field to zero when the slot is restored from disk is also not an option. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Introduce XID age and inactive timeout based replication slot invalidation
Hi, On Mon, Mar 25, 2024 at 02:39:50PM +0530, shveta malik wrote: > I am listing the concerns raised by me: > 3) alter replication slot to alter inactive_timout for synced slots on > standby, should this be allowed? I don't think it should be allowed. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Introduce XID age and inactive timeout based replication slot invalidation
Hi, On Mon, Mar 25, 2024 at 02:07:21PM +0530, shveta malik wrote: > On Mon, Mar 25, 2024 at 1:37 PM Bertrand Drouvot > wrote: > > > > Hi, > > > > On Mon, Mar 25, 2024 at 12:59:52PM +0530, Amit Kapila wrote: > > > On Mon, Mar 25, 2024 at 12:43 PM shveta malik > > > wrote: > > > > > > > > On Mon, Mar 25, 2024 at 11:53 AM shveta malik > > > > wrote: > > > > > > > > > > On Mon, Mar 25, 2024 at 10:33 AM shveta malik > > > > > wrote: > > > > > > > > > > > > On Sun, Mar 24, 2024 at 3:06 PM Bharath Rupireddy > > > > > > wrote: > > > > > > > > > > > > > > I've attached the v18 patch set here. > > > > > > > > I have one concern, for synced slots on standby, how do we disallow > > > > invalidation due to inactive-timeout immediately after promotion? > > > > > > > > For synced slots, last_inactive_time and inactive_timeout are both > > > > set. > > > > Yeah, and I can see last_inactive_time is moving on the standby (while not > > the > > case on the primary), probably due to the sync worker slot > > acquisition/release > > which does not seem right. > > > > > Let's say I bring down primary for promotion of standby and then > > > > promote standby, there are chances that it may end up invalidating > > > > synced slots (considering standby is not brought down during promotion > > > > and thus inactive_timeout may already be past 'last_inactive_time'). > > > > > > > > > > This raises the question of whether we need to set > > > 'last_inactive_time' synced slots on the standby? > > > > Yeah, I think that last_inactive_time should stay at 0 on synced slots on > > the > > standby because such slots are not usable anyway (until the standby gets > > promoted). > > > > So, I think that last_inactive_time does not make sense if the slot never > > had > > the chance to be active. > > > > OTOH I think the timeout invalidation (if any) should be synced from > > primary. > > Yes, even I feel that last_inactive_time makes sense only when the > slot is available to be used. Synced slots are not available to be > used until standby is promoted and thus last_inactive_time can be > skipped to be set for synced_slots. But once primay is invalidated due > to inactive-timeout, that invalidation should be synced to standby > (which is happening currently). > yeah, syncing the invalidation and always keeping last_inactive_time to zero for synced slots looks right to me. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Introduce XID age and inactive timeout based replication slot invalidation
Hi, On Mon, Mar 25, 2024 at 12:59:52PM +0530, Amit Kapila wrote: > On Mon, Mar 25, 2024 at 12:43 PM shveta malik wrote: > > > > On Mon, Mar 25, 2024 at 11:53 AM shveta malik > > wrote: > > > > > > On Mon, Mar 25, 2024 at 10:33 AM shveta malik > > > wrote: > > > > > > > > On Sun, Mar 24, 2024 at 3:06 PM Bharath Rupireddy > > > > wrote: > > > > > > > > > > I've attached the v18 patch set here. > > > > I have one concern, for synced slots on standby, how do we disallow > > invalidation due to inactive-timeout immediately after promotion? > > > > For synced slots, last_inactive_time and inactive_timeout are both > > set. Yeah, and I can see last_inactive_time is moving on the standby (while not the case on the primary), probably due to the sync worker slot acquisition/release which does not seem right. > Let's say I bring down primary for promotion of standby and then > > promote standby, there are chances that it may end up invalidating > > synced slots (considering standby is not brought down during promotion > > and thus inactive_timeout may already be past 'last_inactive_time'). > > > > This raises the question of whether we need to set > 'last_inactive_time' synced slots on the standby? Yeah, I think that last_inactive_time should stay at 0 on synced slots on the standby because such slots are not usable anyway (until the standby gets promoted). So, I think that last_inactive_time does not make sense if the slot never had the chance to be active. OTOH I think the timeout invalidation (if any) should be synced from primary. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Introduce XID age and inactive timeout based replication slot invalidation
Hi, On Mon, Mar 25, 2024 at 12:25:21PM +0530, Bharath Rupireddy wrote: > On Mon, Mar 25, 2024 at 10:28 AM Amit Kapila wrote: > > > > On Mon, Mar 25, 2024 at 9:48 AM Amit Kapila wrote: > > > > > > > > > Such a test looks reasonable but shall we add equal to in the second > > > part of the test (like '$last_inactive_time'::timestamptz >= > > > > '$slot_creation_time'::timestamptz;). This is just to be sure that even > > > > if the test ran fast enough to give the same time, the test shouldn't > > > > fail. I think it won't matter for correctness as well. > > Agree. I added that in v19 patch. I was having that concern in my > mind. That's the reason I wasn't capturing current_time something like > below for the same worry that current_timestamp might be the same (or > nearly the same) as the slot creation time. That's why I ended up > capturing current_timestamp in a separate query than clubbing it up > with pg_create_physical_replication_slot. > > SELECT current_timestamp FROM pg_create_physical_replication_slot('foo'); > > > Apart from this, I have made minor changes in the comments. See and > > let me know what you think of attached. > Thanks! v19-0001 LGTM, just one Nit comment for 019_replslot_limit.pl: The code for "Get last_inactive_time value after the slot's creation" and "Check that the captured time is sane" is somehow duplicated: is it worth creating 2 functions? Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Introduce XID age and inactive timeout based replication slot invalidation
Hi, On Sat, Mar 23, 2024 at 01:11:50PM +0530, Bharath Rupireddy wrote: > On Sat, Mar 23, 2024 at 11:27 AM Amit Kapila wrote: > > > > How about adding the test in 019_replslot_limit? It is not a direct > > fit but I feel later we can even add 'invalid_timeout' related tests > > in this file which will use last_inactive_time feature. > > I'm thinking the other way. Now, the new TAP file 043_replslot_misc.pl > can have last_inactive_time tests, and later invalid_timeout ones too. > This way 019_replslot_limit.pl is not cluttered. I share the same opinion as Amit: I think 019_replslot_limit would be a better place, because I see the timeout as another kind of limit. > > > It is also > > possible that some of the tests added by the 'invalid_timeout' feature > > will obviate the need for some of these tests. > > Might be. But, I prefer to keep both these tests separate but in the > same file 043_replslot_misc.pl. Because we cover some corner cases the > last_inactive_time is set upon loading the slot from disk. Right but I think that this test does not necessary have to be in the same .pl as the one testing the timeout. Could be added in one of the existing .pl like 001_stream_rep.pl for example. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Introduce XID age and inactive timeout based replication slot invalidation
Hi, On Fri, Mar 22, 2024 at 06:02:11PM +0530, Amit Kapila wrote: > On Fri, Mar 22, 2024 at 5:30 PM Bertrand Drouvot > wrote: > > On Fri, Mar 22, 2024 at 03:56:23PM +0530, Amit Kapila wrote: > > > > > > > > That would avoid testing twice "slot->data.persistency == > > > > RS_PERSISTENT". > > > > > > > > > > That sounds like a good idea. Also, don't we need to consider physical > > > slots where we don't reserve WAL during slot creation? I don't think > > > there is a need to set inactive_at for such slots. > > > > If the slot is not active, why shouldn't we set inactive_at? I can > > understand > > that such a slots do not present "any risks" but I think we should still set > > inactive_at (also to not give the false impression that the slot is active). > > > > But OTOH, there is a chance that we will invalidate such slots even > though they have never reserved WAL in the first place which doesn't > appear to be a good thing. That's right but I don't think it is not a good thing. I think we should treat inactive_at as an independent field (like if the timeout one does not exist at all) and just focus on its meaning (slot being inactive). If one sets a timeout (> 0) and gets an invalidation then I think it works as designed (even if the slot does not present any "risk" as it does not hold any rows or WAL). Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Introduce XID age and inactive timeout based replication slot invalidation
Hi, On Fri, Mar 22, 2024 at 04:16:19PM +0530, Amit Kapila wrote: > On Fri, Mar 22, 2024 at 3:23 PM Bertrand Drouvot > wrote: > > > > On Fri, Mar 22, 2024 at 02:59:21PM +0530, Amit Kapila wrote: > > > On Fri, Mar 22, 2024 at 2:27 PM Bertrand Drouvot > > > wrote: > > > > > > > > On Fri, Mar 22, 2024 at 01:45:01PM +0530, Bharath Rupireddy wrote: > > > > > > > > > 0001 Track invalidation_reason in pg_replication_slots > > > > > 0002 Track last_inactive_at in pg_replication_slots > > > > > 0003 Allow setting inactive_timeout for replication slots via SQL API > > > > > 0004 Introduce new SQL funtion pg_alter_replication_slot > > > > > 0005 Allow setting inactive_timeout in the replication command > > > > > 0006 Add inactive_timeout based replication slot invalidation > > > > > > > > > > 1. Keep it last_inactive_at as a shared memory variable, but always > > > > > set it at restart if the slot's inactive_timeout has non-zero value > > > > > and reset it as soon as someone acquires that slot so that if the slot > > > > > doesn't get acquired till inactive_timeout, checkpointer will > > > > > invalidate the slot. > > > > > 4. last_inactive_at should also be set to the current time during slot > > > > > creation because if one creates a slot and does nothing with it then > > > > > it's the time it starts to be inactive. > > > > > > > > I did not look at the code yet but just tested the behavior. It works > > > > as you > > > > describe it but I think this behavior is weird because: > > > > > > > > - when we create a slot without a timeout then last_inactive_at is set. > > > > I think > > > > that's fine, but then: > > > > - when we restart the engine, then last_inactive_at is gone (as timeout > > > > is not > > > > set). > > > > > > > > I think last_inactive_at should be set also at engine restart even if > > > > there is > > > > no timeout. > > > > > > I think it is the opposite. Why do we need to set 'last_inactive_at' > > > when inactive_timeout is not set? > > > > I think those are unrelated, one could want to know when a slot has been > > inactive > > even if no timeout is set. I understand that for this patch series we have > > in mind > > to use them both to invalidate slots but I think that there is use case to > > not > > use both in correlation. Also not setting last_inactive_at could give the > > "false" > > impression that the slot is active. > > > > I see your point and agree with this. I feel we can commit this part > first then, Agree that in this case the current ordering makes sense (as setting last_inactive_at would be completly unrelated to the timeout). Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Introduce XID age and inactive timeout based replication slot invalidation
Hi, On Fri, Mar 22, 2024 at 03:56:23PM +0530, Amit Kapila wrote: > On Fri, Mar 22, 2024 at 3:15 PM Bertrand Drouvot > wrote: > > > > On Fri, Mar 22, 2024 at 01:45:01PM +0530, Bharath Rupireddy wrote: > > > > 1 === > > > > @@ -691,6 +699,13 @@ ReplicationSlotRelease(void) > > ConditionVariableBroadcast(>active_cv); > > } > > > > + if (slot->data.persistency == RS_PERSISTENT) > > + { > > + SpinLockAcquire(>mutex); > > + slot->last_inactive_at = GetCurrentTimestamp(); > > + SpinLockRelease(>mutex); > > + } > > > > I'm not sure we should do system calls while we're holding a spinlock. > > Assign a variable before? > > > > 2 === > > > > Also, what about moving this here? > > > > " > > if (slot->data.persistency == RS_PERSISTENT) > > { > > /* > > * Mark persistent slot inactive. We're not freeing it, just > > * disconnecting, but wake up others that may be waiting for it. > > */ > > SpinLockAcquire(>mutex); > > slot->active_pid = 0; > > SpinLockRelease(>mutex); > > ConditionVariableBroadcast(>active_cv); > > } > > " > > > > That would avoid testing twice "slot->data.persistency == RS_PERSISTENT". > > > > That sounds like a good idea. Also, don't we need to consider physical > slots where we don't reserve WAL during slot creation? I don't think > there is a need to set inactive_at for such slots. If the slot is not active, why shouldn't we set inactive_at? I can understand that such a slots do not present "any risks" but I think we should still set inactive_at (also to not give the false impression that the slot is active). > > 5 === > > > > Most of the fields that reflect a time (not duration) in the system views > > are > > _time, so I'm wondering if instead of "last_inactive_at" we should use > > something like "last_inactive_time"? > > > > How about naming it as last_active_time? This will indicate the time > at which the slot was last active. I thought about it too but I think it could be missleading as one could think that it should be updated each time WAL record decoding is happening. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Introduce XID age and inactive timeout based replication slot invalidation
Hi, On Fri, Mar 22, 2024 at 02:59:21PM +0530, Amit Kapila wrote: > On Fri, Mar 22, 2024 at 2:27 PM Bertrand Drouvot > wrote: > > > > On Fri, Mar 22, 2024 at 01:45:01PM +0530, Bharath Rupireddy wrote: > > > > > 0001 Track invalidation_reason in pg_replication_slots > > > 0002 Track last_inactive_at in pg_replication_slots > > > 0003 Allow setting inactive_timeout for replication slots via SQL API > > > 0004 Introduce new SQL funtion pg_alter_replication_slot > > > 0005 Allow setting inactive_timeout in the replication command > > > 0006 Add inactive_timeout based replication slot invalidation > > > > > > 1. Keep it last_inactive_at as a shared memory variable, but always > > > set it at restart if the slot's inactive_timeout has non-zero value > > > and reset it as soon as someone acquires that slot so that if the slot > > > doesn't get acquired till inactive_timeout, checkpointer will > > > invalidate the slot. > > > 4. last_inactive_at should also be set to the current time during slot > > > creation because if one creates a slot and does nothing with it then > > > it's the time it starts to be inactive. > > > > I did not look at the code yet but just tested the behavior. It works as you > > describe it but I think this behavior is weird because: > > > > - when we create a slot without a timeout then last_inactive_at is set. I > > think > > that's fine, but then: > > - when we restart the engine, then last_inactive_at is gone (as timeout is > > not > > set). > > > > I think last_inactive_at should be set also at engine restart even if there > > is > > no timeout. > > I think it is the opposite. Why do we need to set 'last_inactive_at' > when inactive_timeout is not set? I think those are unrelated, one could want to know when a slot has been inactive even if no timeout is set. I understand that for this patch series we have in mind to use them both to invalidate slots but I think that there is use case to not use both in correlation. Also not setting last_inactive_at could give the "false" impression that the slot is active. > BTW, haven't we discussed that we > don't need to set 'last_inactive_at' at the time of slot creation as > it is sufficient to set it at the time ReplicationSlotRelease()? Right. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Introduce XID age and inactive timeout based replication slot invalidation
Hi, On Fri, Mar 22, 2024 at 01:45:01PM +0530, Bharath Rupireddy wrote: > On Fri, Mar 22, 2024 at 12:39 PM Bertrand Drouvot > wrote: > > > > > > Please find the v14-0001 patch for now. > > > > Thanks! > > > > > LGTM. Let's wait for Bertrand to see if he has more comments on 0001 > > > and then I'll push it. > > > > LGTM too. > > > Please see the attached v14 patch set. No change in the attached > v14-0001 from the previous patch. Looking at v14-0002: 1 === @@ -691,6 +699,13 @@ ReplicationSlotRelease(void) ConditionVariableBroadcast(>active_cv); } + if (slot->data.persistency == RS_PERSISTENT) + { + SpinLockAcquire(>mutex); + slot->last_inactive_at = GetCurrentTimestamp(); + SpinLockRelease(>mutex); + } I'm not sure we should do system calls while we're holding a spinlock. Assign a variable before? 2 === Also, what about moving this here? " if (slot->data.persistency == RS_PERSISTENT) { /* * Mark persistent slot inactive. We're not freeing it, just * disconnecting, but wake up others that may be waiting for it. */ SpinLockAcquire(>mutex); slot->active_pid = 0; SpinLockRelease(>mutex); ConditionVariableBroadcast(>active_cv); } " That would avoid testing twice "slot->data.persistency == RS_PERSISTENT". 3 === @@ -2341,6 +2356,7 @@ RestoreSlotFromDisk(const char *name) slot->in_use = true; slot->active_pid = 0; + slot->last_inactive_at = 0; I think we should put GetCurrentTimestamp() here. It's done in v14-0006 but I think it's better to do it in 0002 (and not taking care of inactive_timeout). 4 === Track last_inactive_at in pg_replication_slots doc/src/sgml/system-views.sgml | 11 +++ src/backend/catalog/system_views.sql | 3 ++- src/backend/replication/slot.c | 16 src/backend/replication/slotfuncs.c | 7 ++- src/include/catalog/pg_proc.dat | 6 +++--- src/include/replication/slot.h | 3 +++ src/test/regress/expected/rules.out | 5 +++-- 7 files changed, 44 insertions(+), 7 deletions(-) Worth to add some tests too (or we postpone them in future commits because we're confident enough they will follow soon)? 5 === Most of the fields that reflect a time (not duration) in the system views are _time, so I'm wondering if instead of "last_inactive_at" we should use something like "last_inactive_time"? Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Introduce XID age and inactive timeout based replication slot invalidation
Hi, On Fri, Mar 22, 2024 at 01:45:01PM +0530, Bharath Rupireddy wrote: > On Fri, Mar 22, 2024 at 12:39 PM Bertrand Drouvot > wrote: > > > > > > Please find the v14-0001 patch for now. > > > > Thanks! > > > > > LGTM. Let's wait for Bertrand to see if he has more comments on 0001 > > > and then I'll push it. > > > > LGTM too. > > Thanks. Here I'm implementing the following: Thanks! > 0001 Track invalidation_reason in pg_replication_slots > 0002 Track last_inactive_at in pg_replication_slots > 0003 Allow setting inactive_timeout for replication slots via SQL API > 0004 Introduce new SQL funtion pg_alter_replication_slot > 0005 Allow setting inactive_timeout in the replication command > 0006 Add inactive_timeout based replication slot invalidation > > 1. Keep it last_inactive_at as a shared memory variable, but always > set it at restart if the slot's inactive_timeout has non-zero value > and reset it as soon as someone acquires that slot so that if the slot > doesn't get acquired till inactive_timeout, checkpointer will > invalidate the slot. > 4. last_inactive_at should also be set to the current time during slot > creation because if one creates a slot and does nothing with it then > it's the time it starts to be inactive. I did not look at the code yet but just tested the behavior. It works as you describe it but I think this behavior is weird because: - when we create a slot without a timeout then last_inactive_at is set. I think that's fine, but then: - when we restart the engine, then last_inactive_at is gone (as timeout is not set). I think last_inactive_at should be set also at engine restart even if there is no timeout. I don't think we should link both. Changing my mind here on this subject due to the testing. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Introduce XID age and inactive timeout based replication slot invalidation
Hi, On Fri, Mar 22, 2024 at 10:49:17AM +0530, Amit Kapila wrote: > On Thu, Mar 21, 2024 at 11:21 PM Bharath Rupireddy > wrote: > > > > > > Please find the v14-0001 patch for now. Thanks! > LGTM. Let's wait for Bertrand to see if he has more comments on 0001 > and then I'll push it. LGTM too. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Introduce XID age and inactive timeout based replication slot invalidation
Hi, On Thu, Mar 21, 2024 at 04:13:31PM +0530, Bharath Rupireddy wrote: > On Thu, Mar 21, 2024 at 3:20 PM Amit Kapila wrote: > > > > > My concern was that we set catalog_xmin at logical slot creation time. So > > > if we > > > set last_inactive_at to zero at creation time and the slot is not used > > > for a long > > > period of time > timeout, then I think it's not helping there. > > > > But, we do call ReplicationSlotRelease() after slot creation. For > > example, see CreateReplicationSlot(). So wouldn't that take care of > > the case you are worried about? > > Right. That's true even for pg_create_physical_replication_slot and > pg_create_logical_replication_slot. AFAICS, setting it to the current > timestamp in ReplicationSlotRelease suffices unless I'm missing > something. Right, but we have: " if (set_last_inactive_at && slot->data.persistency == RS_PERSISTENT) { /* * There's no point in allowing failover slots to get invalidated * based on slot's inactive_timeout parameter on standby. The failover * slots simply get synced from the primary on the standby. */ if (!(RecoveryInProgress() && slot->data.failover)) { SpinLockAcquire(>mutex); slot->last_inactive_at = GetCurrentTimestamp(); SpinLockRelease(>mutex); } } " while we set set_last_inactive_at to false at creation time so that last_inactive_at is not set to GetCurrentTimestamp(). We should set set_last_inactive_at to true if a timeout is provided during the slot creation. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Introduce XID age and inactive timeout based replication slot invalidation
Hi, On Thu, Mar 21, 2024 at 05:05:46AM +0530, Bharath Rupireddy wrote: > On Wed, Mar 20, 2024 at 1:04 PM Bertrand Drouvot > wrote: > > > > On Wed, Mar 20, 2024 at 08:58:05AM +0530, Amit Kapila wrote: > > > On Wed, Mar 20, 2024 at 12:49 AM Bharath Rupireddy > > > wrote: > > > > > > > > Following are some open points: > > > > > > > > 1. Where to do inactive_timeout invalidation exactly if not the > > > > checkpointer. > > > > > > > I have suggested to do it at the time of CheckpointReplicationSlots() > > > and Bertrand suggested to do it whenever we resume using the slot. I > > > think we should follow both the suggestions. > > > > Agree. I also think that pg_get_replication_slots() would be a good place, > > so > > that queries would return the right invalidation status. > > I've addressed review comments and attaching the v13 patches with the > following changes: Thanks! v13-0001 looks good to me. The only Nit (that I've mentioned up-thread) is that in the pg_replication_slots view, the invalidation_reason is "far away" from the conflicting field. I understand that one could query the fields individually but when describing the view or reading the doc, it seems more appropriate to see them closer. Also as "failover" and "synced" are also new in version 17, there is no risk to break order by "17,18" kind of queries (which are the failover and sync positions). Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Introduce XID age and inactive timeout based replication slot invalidation
Hi, On Thu, Mar 21, 2024 at 11:53:32AM +0530, Amit Kapila wrote: > On Thu, Mar 21, 2024 at 11:37 AM Bertrand Drouvot > wrote: > > > > We could also do the above 3 and altering the timeout with a replication > > connection but the SQL API seems more natural to me. > > > > If we want to go with this then I think we should at least ensure that > if one specified timeout via CREATE_REPLICATION_SLOT or > ALTER_REPLICATION_SLOT that should be honored. Yeah, agree. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Introduce XID age and inactive timeout based replication slot invalidation
Hi, On Thu, Mar 21, 2024 at 11:43:54AM +0530, Amit Kapila wrote: > On Thu, Mar 21, 2024 at 11:23 AM Bertrand Drouvot > wrote: > > > > On Thu, Mar 21, 2024 at 08:47:18AM +0530, Amit Kapila wrote: > > > On Wed, Mar 20, 2024 at 1:51 PM Bertrand Drouvot > > > wrote: > > > > > > > > On Wed, Mar 20, 2024 at 12:48:55AM +0530, Bharath Rupireddy wrote: > > > > > > > > > > 2. last_inactive_at and inactive_timeout are now tracked in on-disk > > > > > replication slot data structure. > > > > > > > > Should last_inactive_at be tracked on disk? Say the engine is down for > > > > a period > > > > of time > inactive_timeout then the slot will be invalidated after the > > > > engine > > > > re-start (if no activity before we invalidate the slot). Should the > > > > time the > > > > engine is down be counted as "inactive" time? I've the feeling it > > > > should not, and > > > > that we should only take into account inactive time while the engine is > > > > up. > > > > > > > > > > Good point. The question is how do we achieve this without persisting > > > the 'last_inactive_at'? Say, 'last_inactive_at' for a particular slot > > > had some valid value before we shut down but it still didn't cross the > > > configured 'inactive_timeout' value, so, we won't be able to > > > invalidate it. Now, after the restart, as we don't know the > > > last_inactive_at's value before the shutdown, we will initialize it > > > with 0 (this is what Bharath seems to have done in the latest > > > v13-0002* patch). After this, even if walsender or backend never > > > acquires the slot, we won't invalidate it. OTOH, if we track > > > 'last_inactive_at' on the disk, after, restart, we could initialize it > > > to the current time if the value is non-zero. Do you have any better > > > ideas? > > > > > > > I think that setting last_inactive_at when we restart makes sense if the > > slot > > has been active previously. I think the idea is because it's holding > > xmin/catalog_xmin > > and that we don't want to prevent rows removal longer that the timeout. > > > > So what about relying on xmin/catalog_xmin instead that way? > > > > That doesn't sound like a great idea because xmin/catalog_xmin values > won't tell us before restart whether it was active or not. It could > have been inactive for long time before restart but the xmin values > could still be valid. Right, the idea here was more like "don't hold xmin/catalog_xmin" for longer than timeout. My concern was that we set catalog_xmin at logical slot creation time. So if we set last_inactive_at to zero at creation time and the slot is not used for a long period of time > timeout, then I think it's not helping there. > What about we always set 'last_inactive_at' at > restart (if the slot's inactive_timeout has non-zero value) and reset > it as soon as someone acquires that slot? Now, if the slot doesn't get > acquired till 'inactive_timeout', checkpointer will invalidate the > slot. Yeah that sounds good to me, but I think we should set last_inactive_at at creation time too, if not: - physical slot could remain valid for long time after creation (which is fine) but the behavior would change at restart. - logical slot would have the "issue" reported above (holding catalog_xmin). Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com