[MariaDB developers] Re: Suggestions for the problems around replication of XA in 10.5

Andrei via developers Thu, 25 Jan 2024 10:09:32 -0800

Hi Kristian!

Kristian Nielsen kirjoitti 2024-01-24 23:59:

andrei.el...@pp.inet.fi writes:
Back in Aug I wrote a patch that converts XA:s to normal transactionsat binlogging.
I don't think I saw that patch, but it sounds like exactly what I am
proposing as an alternate solution... ?


MIne is just simplistic to never log any prepare part in the binlog.
 In the follow-up mail
  Date: Wed, 24 Jan 2024 13:41:15 +0200
I explained more on it.

Yours is apparently better for the user as it provides at least the xaoriginal host

recovery.

Notice, that an initialization part of  failover 'the few
transactions' that would have to be  "officially"
prepared now. However MDEV-32020 shows just two is enough for hanging.
Therefore this may not be a solution for the failover case.

But what is your point? If the XA PREPAREs hang when applied atfailover,they will also hang in the current implementation. The user who wantsto use

failover of XA PREPAREd transactions will have to accept severe
restrictions, such as row-based primary-key only replication.


Kristian, this statement of
    row-based primary-key only replication

requires at least a confirmation with a test. Oth ROW may be somewhatdifficult to dismissand I am not going to now, but the PK is too much as a UK guaranteescorrectness, please read on.

So far we only have MDEV 32020 about non-UK ROW format vulnerability.

To our and our users testing the current XA replication works when there

is at least one unique key. And there's a theoretical background tovalidate

except of course for implementation bugs.

Let me narrow our context to the Read-Committed isolation and ROWformat.

After MDEV-30165/26682 have removed GAP locks from prepared xa:s, thelatter cease to bepotentially unilaterally-slave-side conflicting with the following inbinlog order ones.That's because a prepared XA can only hold conflicting X locks onindexes and Insert-Intention ones

are harmless for the following in binlog order normal trx' GAP locks.
In presence of a Unique Key therefore
   XAP_1 (XAP := XA-Prepare) can *not* stop any normal Trx_2
 (think of 1,2 as gtid seq_no:s).

 That's exactly
why it must be optional and off by default, so it doesn't affect _all_XA
users (as it does currently).

We might have to resort to that, but first I'd take on analysis ofanything

that gets in a way, and hopefully it could be tackled as MDEV-32020 with
 a menu of choices.

> Another architectural issue is that each XA PREPARE keeps row locks around
> on every slave until commit (or rollback). This means replication will break

Indeed, but there can be no non-GAP lock on any unique index.

Only by restricting replication to primary-key updates only. MariaDB isan

SQL database, please don't try to turn it into a key-value store.


Let me state it this way:

with at least one UK *and* GAP locks out of the picture two binloggedtransactions can

 have conflicts, if any, only through its index X locks.
A trx obviously X-locks all modified records.

In a really disastrous cases (which we're unaware of as of yet) there
exits a safety measure to identify a prepared
XA that got in a way of next transactions, to roll it back and
re-apply like a normal transaction when XA-COMMIT finally arrives.
By "exist" I think you mean "exist an idea" - this is not implementedin thecurrent code. In fact, this is exactly the point of my alternatesolution,
to make it possible to apply at XA COMMIT time. And once you have this,
there is no point to apply it at XA PREPARE, since for mosttransactions,
the XA COMMIT comes shortly after the XA PREPARE.

There's point and its name is yours aka ... optimistic (parallel)execution :-).Why should we defer a day long transaction execution when its events,maybe not all, are around,

and we can always retreat to the savepoint of its BEGIN?!

Non-unique indexes remain vulnerable but only in ROW format,
What do you mean by "vulnerable only in ROW format"? There are manycases ofstatement-based replication with XA that will break the slave incurrent
inplementation. The test cases from MDEV-5914 and MDEV-5941 for example
(from when CONSERVATIVE parallel replication was implemented) alsocause
current XA to break in statement/mixed replication mode.

The root of the issue is not XA. The latter may exacerbate what mightin
normal transaction case lead to "just" (double quote here to hint that
the current XA hang might be still better option for the user) datainconsistency.
What data inconsistency?

(I don't mean how the notion of consistency can apply to a no-UK tablecase, do you :-?)

For instance the MIN row format. When master and slave are instructed toexecuted a trx using different indexes

a non-UK table whey can modify in the end different records.

> This for example means that a mysqldump backup can no longer be used to
> provision a slave, since any XA PREPAREd event at the time of the dump will
Notice this is regardless of how XA are binlogged/replicated.
This provisioned server won't be able to replace the original serverat failover.In other words rpl_xa_provision.test also relates to this generalissue.
The problem is not failover to the new server, that will be possible assoonas all transactions that were XA PREPAREd at the time of dump arecommitted,
which is normally a fraction of a second.


Well, in your case there's a condition ' will be possible as soon ...'.

I don't have it. Just run that sql file and a clone of a master isprovisioned.

It sure works like that for the normal trx, and does not for the XA.

To me it's a problem to resolve.

The problem is that in the current implementation, a slave setup fromthedump does not have a binlog position to start from, any position willcause
replication to fail.

Of course. The prepared XA:s gtid:s are thought to be ones of committedtrx:s.

I thought to copy binlog events of all XA-prepared like gtid 0-1-1 to`dump.sql`.
So then you will need to extend the binlog checkpoint mechanism topreserve
binlogs for all pending XA PREPAREd transactions, just as I propose.


Right. Let's take this part.

And

once you do that, there's no longer a need to apply the XA PREPAREsuntil

failover.

If the binlog is not available, then a list of gtids of XA-prepared:s


You will need the binlog, otherwise how will you preserve the list of
gtids of pending XA-prepared transactions across server restart?


I am merging currently for final piece of XA recovery which is
XA_list_log_event. It's a part of the part IV of bb-10.6-MDEV-31949
that I need to update (ETA tomorrow).
The list is needed in order to decide what to do with a prepared user xa
xid at restart, when binlog purged already the prepared part.

XA  PREPARE 'x';  #=> OK to the user
*crash of master*

*failover to slave*
XA RECOVER;      #=> 'x' is the prepared list

does not work in your simpler design.


That's the whole point of the "optionally we can support the specific
usecase of being able to recover external XA PREPAREd transactions on a

slave after failover". Of course my proposal is not implemented yet,but why

do you think it cannot work?

Take MDEV-32020 description case. Crash the hanging slave. Restart it asmaster whichensues executing of the two, right? And with the same effect on thehanging salve.I thought we got on the same page on this back at our zulipconversation.

> There are other problems; for example the update of mysql.gtid_slave_pos
> cannot be committed crash-safe together with the transaction for XA PREPARE
> (since the transaction is not committed).

For this part I mentioned MDEV-21117 many times. It's going to involve

But this is still not implemeted, right? (I think you meant another bugthan

MDEV-21117).


MDEV-21777, indeed. Thanks.
Not implemented. We were four people and one tester at one point..

[ Let reply to more general subjects below later?
there's a hot release stuff awaiting my attention..

And I can't help to underline the real virtue of the XA replication asapioneer of "fragmented" replication that I tried to promote forKristian in
Fragmented replication should send events to the slave as soon aspossible
after starting on the master, so the slave has time to work on it in
parallel. And any conflicts with the commit order of other transactions
should be detected and the fragmented transaction rolled back andretried.
But the current XA implementation does exactly the opposite: It sendstheevents to the slave only at the end of the transaction (XA PREPARE),and itmakes it _impossible_ to rollback and retry the prepare in case ofconflict(by assigning a GTID to the XA PREPARE that's updated in the@@gtid_slave_pos).
The rational part is that the XA transaction is represented by morethan
one GTID. Arguably it's a bit uncomfortable, but such generalization
is fair to call flexible especially looking forward on implementingfragmented
transaction replication, or long running and non necessarily
transactional DML or DDL statements including ALTER TABLE.
But I don't see any new code in the current XA implementation thatcould be
used for more general "fragmented transaction replication" - what did I
miss?
Why do you want to assign more than one GTID to a fragmentedtransaction,
wouldn't it be better to binlog the fragments without a GTID (as in my
proposed XA solution)?

 ... ^ ]

 (I am backing up below specifically) slows things down.
Big prepared XA transactions would be idling around - and apparently
creating hot
spots for overall execution when their commits finally  arrive.

There is no slowdown from this. Running a big transaction takes thesame

time whether you end it with an XA PREPARE or an XA COMMIT.

Slowdown is apparent at failover. There's nothing good in having someoperations delayed in general,especially that we must have agreed (in the past at least) on thedynamic forced (by "circumstances") rollback idea.(Personally I would add, to hear defenses like that from the author orthe optimistic parallel replication

is as painful as blasphemy from the mouth of a local priest :-)!)

In fact, my proposal will speed things up, because only one 2-phasecommit
between binlog and engine is needed per transaction. While in current
implementation two are needed, one for XA PREPARE and one for XACOMMIT.
And a simple sequence like this will be able to group commit together(on
the slave):

  XA PREPARE 't1';
  XA COMMIT 't1';
  XA PREPARE 't2';
  XA COMMIT 't2';
  XA PREPARE 't3';
  XA COMMIT 't3';

I believe in the current code, it's impossible to group-commit the XA
PREPARE together with the XA COMMIT on a slave?

Of course they can be in one group. That's the part II ready for reviewin bb-10.6-MDEV-31949.


 - Kristian.


All the best,
Andrei
_______________________________________________
developers mailing list -- developers@lists.mariadb.org
To unsubscribe send an email to developers-le...@lists.mariadb.org

[MariaDB developers] Re: Suggestions for the problems around replication of XA in 10.5

Reply via email to