Hi Monty,

As promised, here my thoughts around the issues with the implementation of
replication of external XA as implemented since MariaDB 10.5.


I see at least two architectural issues with the current implementation.

One is that it splits transactions in two separate GTIDs in the binlog (XA
PREPARE and XA COMMIT). This breaks with the fundamental principle that
replication applies transactions one after the other in strict sequence, and
the replication position/GTID is the single transaction last replicated.
This for example means that a mysqldump backup can no longer be used to
provision a slave, since any XA PREPAREd event at the time of the dump will
be missing; a testcase rpl_xa_provision.test in MDEV-32020 demonstrates
this.

Another architectural issue is that each XA PREPARE keeps row locks around
on every slave until commit (or rollback). This means replication will break
if _any_ transaction replicated after the XA PREPARE gets blocked on a lock.
This can easily happen; surely in many ways in statement-based replication,
and even in row-based replication without primary key as demonstrated by
testcase rpl_mdev32020.test in MDEV-32020.

There are other problems; for example the update of mysql.gtid_slave_pos
cannot be committed crash-safe together with the transaction for XA PREPARE
(since the transaction is not committed). I believe the root of the problem
is architectural: external XA should be replicated only after they commit on
the master. Trying to fix individual problems one by one will not address
the root problem and will lead to ever increasing complexity without ever
being fully successful.

The current implementation appears to only address a very specific and rare
use-case, where enhanced semi-synchronous replication is used with row-based
binlogging to try to fail-over to a slave and preserve any external XA that
was in PREPAREd state on the master before the failover. Mixed-mode
replication, provisioning slaves with mysqldump, slaves not intended for
failover, etc., seem to be not considered and basically broken since 10.5.


Here is my idea for a design that solves most of these problems.

At XA PREPARE, we can still write the events to the binlog, but without a
GTID, and we do not replicate it to slaves by default. Then at XA COMMIT we
binlog a commit transaction that can be replicated normally to the slaves
without problems. If necessary, the events for XA COMMIT can be read from
the PREPARE earlier in the binlog, eg. after server crash/restart. We
already have the binlog checkpoint mechanism to ensure that required binlog
files are preserved until no longer needed for transaction recovery.

This way we make external XA preserved across server restart, and all normal
replication features continue to work - mysqldump, mixed-mode, etc. Nice and
simple.

Then optionally we can support the specific usecase of being able to recover
external XA PREPAREd transactions on a slave after failover. When enabled,
the slave can receive the XA PREPARE events and binlog them itself, without
applying. Then as part of failover, those XA PREPARE in the binlog that are
still pending can be applied, leaving them in PREPAREd state on the new
master. This way, _only_ the few transactions that need to be failed-over
need special handling, the majority can still just replicate normally.

There are different refinements and optimizations that can be added on top
of this. But the point is that this is a simple implementation that is
robust, correct, and crash-safe from the start, without needing to add
complexity and fixes on top.

I've done some initial proof-of-concept code for this, and continue to work
on it on the branch knielsen_mdev32020 on github.

 - Kristian.
_______________________________________________
developers mailing list -- developers@lists.mariadb.org
To unsubscribe send an email to developers-le...@lists.mariadb.org

Reply via email to