This is an automated email from the ASF dual-hosted git repository.

reshke pushed a commit to branch address_1566_p2
in repository https://gitbox.apache.org/repos/asf/cloudberry.git

commit 55e5d04dc5edb005e4dba5bafa07b9e0fb67f51d
Author: Heikki Linnakangas <[email protected]>
AuthorDate: Mon Jun 27 08:21:08 2022 +0300

    Fix visibility check when XID is committed in CLOG but not in procarray.
    
    TransactionIdIsInProgress had a fast path to return 'false' if the
    single-item CLOG cache said that the transaction was known to be
    committed. However, that was wrong, because a transaction is first
    marked as committed in the CLOG but doesn't become visible to others
    until it has removed its XID from the proc array. That could lead to an
    error:
    
        ERROR:  t_xmin is uncommitted in tuple to be updated
    
    or for an UPDATE to go ahead without blocking, before the previous
    UPDATE on the same row was made visible.
    
    The window is usually very short, but synchronous replication makes it
    much wider, because the wait for synchronous replica happens in that
    window.
    
    Another thing that makes it hard to hit is that it's hard to get such
    a commit-in-progress transaction into the single item CLOG cache.
    Normally, if you call TransactionIdIsInProgress on such a transaction,
    it determines that the XID is in progress without checking the CLOG
    and without populating the cache. One way to prime the cache is to
    explicitly call pg_xact_status() on the XID. Another way is to use a
    lot of subtransactions, so that the subxid cache in the proc array is
    overflown, making TransactionIdIsInProgress rely on pg_subtrans and
    CLOG checks.
    
    This has been broken ever since it was introduced in 2008, but the race
    condition is very hard to hit, especially without synchronous
    replication. There were a couple of reports of the error starting from
    summer 2021, but no one was able to find the root cause then.
    
    TransactionIdIsKnownCompleted() is now unused. In 'master', remove it,
    but I left it in place in backbranches in case it's used by extensions.
    
    Also change pg_xact_status() to check TransactionIdIsInProgress().
    Previously, it only checked the CLOG, and returned "committed" before
    the transaction was actually made visible to other queries. Note that
    this also means that you cannot use pg_xact_status() to reproduce the
    bug anymore, even if the code wasn't fixed.
    
    Report and analysis by Konstantin Knizhnik. Patch by Simon Riggs, with
    the pg_xact_status() change added by me.
    
    Author: Simon Riggs
    Reviewed-by: Andres Freund
    Discussion: 
https://www.postgresql.org/message-id/flat/4da7913d-398c-e2ad-d777-f752cf7f0bbb%40garret.ru
---
 src/backend/access/transam/transam.c | 13 +++++++++----
 src/backend/storage/ipc/procarray.c  | 12 +++++++++++-
 src/backend/utils/adt/xid8funcs.c    | 30 ++++++++++++------------------
 3 files changed, 32 insertions(+), 23 deletions(-)

diff --git a/src/backend/access/transam/transam.c 
b/src/backend/access/transam/transam.c
index 1c881550b65..f7c8e1b3466 100644
--- a/src/backend/access/transam/transam.c
+++ b/src/backend/access/transam/transam.c
@@ -249,10 +249,15 @@ TransactionIdDidAbortForReader(TransactionId 
transactionId)
  *
  * This does NOT look into pg_xact but merely probes our local cache
  * (and so it's not named TransactionIdDidComplete, which would be the
- * appropriate name for a function that worked that way).  The intended
- * use is just to short-circuit TransactionIdIsInProgress calls when doing
- * repeated heapam_visibility.c checks for the same XID.  If this isn't
- * extremely fast then it will be counterproductive.
+ * appropriate name for a function that worked that way).
+ *
+ * NB: This is unused, and will be removed in v15. This was used to
+ * short-circuit TransactionIdIsInProgress, but that was wrong for a
+ * transaction that was known to be marked as committed in CLOG but not
+ * yet removed from the proc array. This is kept in backbranches just in
+ * case it is still used by extensions.  However, extensions doing
+ * something similar to tuple visibility checks should also be careful to
+ * check the proc array first!
  *
  * Note:
  *             Assumes transaction identifier is valid.
diff --git a/src/backend/storage/ipc/procarray.c 
b/src/backend/storage/ipc/procarray.c
index 57c03cce7d9..89fafdbedc1 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -277,6 +277,11 @@ static ProcArrayStruct *procArray;
 static PGPROC *allProcs;
 static TMGXACT *allTmGxact;
 
+/*
+ * Cache to reduce overhead of repeated calls to TransactionIdIsInProgress()
+ */
+static TransactionId cachedXidIsNotInProgress = InvalidTransactionId;
+
 /*
  * Bookkeeping for tracking emulated transactions in recovery
  */
@@ -1486,7 +1491,7 @@ TransactionIdIsInProgress(TransactionId xid)
         * already known to be completed, we can fall out without any access to
         * shared memory.
         */
-       if (TransactionIdIsKnownCompleted(xid))
+       if (TransactionIdEquals(cachedXidIsNotInProgress, xid))
        {
                xc_by_known_xact_inc();
                return false;
@@ -1644,6 +1649,7 @@ TransactionIdIsInProgress(TransactionId xid)
        if (nxids == 0)
        {
                xc_no_overflow_inc();
+               cachedXidIsNotInProgress = xid;
                return false;
        }
 
@@ -1658,7 +1664,10 @@ TransactionIdIsInProgress(TransactionId xid)
        xc_slow_answer_inc();
 
        if (TransactionIdDidAbort(xid))
+       {
+               cachedXidIsNotInProgress = xid;
                return false;
+       }
 
        /*
         * It isn't aborted, so check whether the transaction tree it belongs to
@@ -1676,6 +1685,7 @@ TransactionIdIsInProgress(TransactionId xid)
                }
        }
 
+       cachedXidIsNotInProgress = xid;
        return false;
 }
 
diff --git a/src/backend/utils/adt/xid8funcs.c 
b/src/backend/utils/adt/xid8funcs.c
index 78b0b9b6d68..0c9f14a0c83 100644
--- a/src/backend/utils/adt/xid8funcs.c
+++ b/src/backend/utils/adt/xid8funcs.c
@@ -36,6 +36,7 @@
 #include "miscadmin.h"
 #include "postmaster/postmaster.h"
 #include "storage/lwlock.h"
+#include "storage/procarray.h"
 #include "utils/builtins.h"
 #include "utils/memutils.h"
 #include "utils/snapmgr.h"
@@ -810,29 +811,22 @@ pg_xact_status(PG_FUNCTION_ARGS)
        {
                Assert(TransactionIdIsValid(xid));
 
-               if (TransactionIdIsCurrentTransactionId(xid))
+               /*
+                * Like when doing visiblity checks on a row, check whether the
+                * transaction is still in progress before looking into the 
CLOG.
+                * Otherwise we would incorrectly return "committed" for a 
transaction
+                * that is committing and has already updated the CLOG, but 
hasn't
+                * removed its XID from the proc array yet. (See comment on 
that race
+                * condition at the top of heapam_visibility.c)
+                */
+               if (TransactionIdIsInProgress(xid))
                        status = "in progress";
                else if (TransactionIdDidCommit(xid))
                        status = "committed";
-               else if (TransactionIdDidAbort(xid))
-                       status = "aborted";
                else
                {
-                       /*
-                        * The xact is not marked as either committed or 
aborted in clog.
-                        *
-                        * It could be a transaction that ended without 
updating clog or
-                        * writing an abort record due to a crash. We can 
safely assume
-                        * it's aborted if it isn't committed and is older than 
our
-                        * snapshot xmin.
-                        *
-                        * Otherwise it must be in-progress (or have been at 
the time we
-                        * checked commit/abort status).
-                        */
-                       if (TransactionIdPrecedes(xid, 
GetActiveSnapshot()->xmin))
-                               status = "aborted";
-                       else
-                               status = "in progress";
+                       /* it must have aborted or crashed */
+                       status = "aborted";
                }
        }
        else


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to