Hi Kristian!
Kristian Nielsen kirjoitti 2024-01-24 23:59:
andrei.el...@pp.inet.fi writes:
Back in Aug I wrote a patch that converts XA:s to normal transactions
at binlogging.
I don't think I saw that patch, but it sounds like exactly what I am
proposing as an alternate solution... ?
MIne is just simplistic to never log any prepare part in the binlog.
In the follow-up mail
Date: Wed, 24 Jan 2024 13:41:15 +0200
I explained more on it.
Yours is apparently better for the user as it provides at least the xa
original host
recovery.
Notice, that an initialization part of failover 'the few
transactions' that would have to be "officially"
prepared now. However MDEV-32020 shows just two is enough for hanging.
Therefore this may not be a solution for the failover case.
But what is your point? If the XA PREPAREs hang when applied at
failover,
they will also hang in the current implementation. The user who wants
to use
failover of XA PREPAREd transactions will have to accept severe
restrictions, such as row-based primary-key only replication.
Kristian, this statement of
row-based primary-key only replication
requires at least a confirmation with a test. Oth ROW may be somewhat
difficult to dismiss
and I am not going to now, but the PK is too much as a UK guarantees
correctness, please read on.
So far we only have MDEV 32020 about non-UK ROW format vulnerability.
To our and our users testing the current XA replication works when there
is at least one unique key. And there's a theoretical background to
validate
except of course for implementation bugs.
Let me narrow our context to the Read-Committed isolation and ROW
format.
After MDEV-30165/26682 have removed GAP locks from prepared xa:s, the
latter cease to be
potentially unilaterally-slave-side conflicting with the following in
binlog order ones.
That's because a prepared XA can only hold conflicting X locks on
indexes and Insert-Intention ones
are harmless for the following in binlog order normal trx' GAP locks.
In presence of a Unique Key therefore
XAP_1 (XAP := XA-Prepare) can *not* stop any normal Trx_2
(think of 1,2 as gtid seq_no:s).
That's exactly
why it must be optional and off by default, so it doesn't affect _all_
XA
users (as it does currently).
We might have to resort to that, but first I'd take on analysis of
anything
that gets in a way, and hopefully it could be tackled as MDEV-32020 with
a menu of choices.
> Another architectural issue is that each XA PREPARE keeps row locks around
> on every slave until commit (or rollback). This means replication will break
Indeed, but there can be no non-GAP lock on any unique index.
Only by restricting replication to primary-key updates only. MariaDB is
an
SQL database, please don't try to turn it into a key-value store.
Let me state it this way:
with at least one UK *and* GAP locks out of the picture two binlogged
transactions can
have conflicts, if any, only through its index X locks.
A trx obviously X-locks all modified records.
In a really disastrous cases (which we're unaware of as of yet) there
exits a safety measure to identify a prepared
XA that got in a way of next transactions, to roll it back and
re-apply like a normal transaction when XA-COMMIT finally arrives.
By "exist" I think you mean "exist an idea" - this is not implemented
in the
current code. In fact, this is exactly the point of my alternate
solution,
to make it possible to apply at XA COMMIT time. And once you have this,
there is no point to apply it at XA PREPARE, since for most
transactions,
the XA COMMIT comes shortly after the XA PREPARE.
There's point and its name is yours aka ... optimistic (parallel)
execution :-).
Why should we defer a day long transaction execution when its events,
maybe not all, are around,
and we can always retreat to the savepoint of its BEGIN?!
Non-unique indexes remain vulnerable but only in ROW format,
What do you mean by "vulnerable only in ROW format"? There are many
cases of
statement-based replication with XA that will break the slave in
current
inplementation. The test cases from MDEV-5914 and MDEV-5941 for example
(from when CONSERVATIVE parallel replication was implemented) also
cause
current XA to break in statement/mixed replication mode.
The root of the issue is not XA. The latter may exacerbate what might
in
normal transaction case lead to "just" (double quote here to hint that
the current XA hang might be still better option for the user) data
inconsistency.
What data inconsistency?
(I don't mean how the notion of consistency can apply to a no-UK table
case, do you :-?)
For instance the MIN row format. When master and slave are instructed to
executed a trx using different indexes
a non-UK table whey can modify in the end different records.
> This for example means that a mysqldump backup can no longer be used to
> provision a slave, since any XA PREPAREd event at the time of the dump will
Notice this is regardless of how XA are binlogged/replicated.
This provisioned server won't be able to replace the original server
at failover.
In other words rpl_xa_provision.test also relates to this general
issue.
The problem is not failover to the new server, that will be possible as
soon
as all transactions that were XA PREPAREd at the time of dump are
committed,
which is normally a fraction of a second.
Well, in your case there's a condition ' will be possible as soon ...'.
I don't have it. Just run that sql file and a clone of a master is
provisioned.
It sure works like that for the normal trx, and does not for the XA.
To me it's a problem to resolve.
The problem is that in the current implementation, a slave setup from
the
dump does not have a binlog position to start from, any position will
cause
replication to fail.
Of course. The prepared XA:s gtid:s are thought to be ones of committed
trx:s.
I thought to copy binlog events of all XA-prepared like gtid 0-1-1 to
`dump.sql`.
So then you will need to extend the binlog checkpoint mechanism to
preserve
binlogs for all pending XA PREPAREd transactions, just as I propose.
Right. Let's take this part.
And
once you do that, there's no longer a need to apply the XA PREPAREs
until
failover.
If the binlog is not available, then a list of gtids of XA-prepared:s
You will need the binlog, otherwise how will you preserve the list of
gtids of pending XA-prepared transactions across server restart?
I am merging currently for final piece of XA recovery which is
XA_list_log_event. It's a part of the part IV of bb-10.6-MDEV-31949
that I need to update (ETA tomorrow).
The list is needed in order to decide what to do with a prepared user xa
xid at restart, when binlog purged already the prepared part.
XA PREPARE 'x'; #=> OK to the user
*crash of master*
*failover to slave*
XA RECOVER; #=> 'x' is the prepared list
does not work in your simpler design.
That's the whole point of the "optionally we can support the specific
usecase of being able to recover external XA PREPAREd transactions on a
slave after failover". Of course my proposal is not implemented yet,
but why
do you think it cannot work?
Take MDEV-32020 description case. Crash the hanging slave. Restart it as
master which
ensues executing of the two, right? And with the same effect on the
hanging salve.
I thought we got on the same page on this back at our zulip
conversation.
> There are other problems; for example the update of mysql.gtid_slave_pos
> cannot be committed crash-safe together with the transaction for XA PREPARE
> (since the transaction is not committed).
For this part I mentioned MDEV-21117 many times. It's going to involve
But this is still not implemeted, right? (I think you meant another bug
than
MDEV-21117).
MDEV-21777, indeed. Thanks.
Not implemented. We were four people and one tester at one point..
[ Let reply to more general subjects below later?
there's a hot release stuff awaiting my attention..
And I can't help to underline the real virtue of the XA replication as
a
pioneer of "fragmented" replication that I tried to promote for
Kristian in
Fragmented replication should send events to the slave as soon as
possible
after starting on the master, so the slave has time to work on it in
parallel. And any conflicts with the commit order of other transactions
should be detected and the fragmented transaction rolled back and
retried.
But the current XA implementation does exactly the opposite: It sends
the
events to the slave only at the end of the transaction (XA PREPARE),
and it
makes it _impossible_ to rollback and retry the prepare in case of
conflict
(by assigning a GTID to the XA PREPARE that's updated in the
@@gtid_slave_pos).
The rational part is that the XA transaction is represented by more
than
one GTID. Arguably it's a bit uncomfortable, but such generalization
is fair to call flexible especially looking forward on implementing
fragmented
transaction replication, or long running and non necessarily
transactional DML or DDL statements including ALTER TABLE.
But I don't see any new code in the current XA implementation that
could be
used for more general "fragmented transaction replication" - what did I
miss?
Why do you want to assign more than one GTID to a fragmented
transaction,
wouldn't it be better to binlog the fragments without a GTID (as in my
proposed XA solution)?
... ^ ]
(I am backing up below specifically) slows things down.
Big prepared XA transactions would be idling around - and apparently
creating hot
spots for overall execution when their commits finally arrive.
There is no slowdown from this. Running a big transaction takes the
same
time whether you end it with an XA PREPARE or an XA COMMIT.
Slowdown is apparent at failover. There's nothing good in having some
operations delayed in general,
especially that we must have agreed (in the past at least) on the
dynamic forced (by "circumstances") rollback idea.
(Personally I would add, to hear defenses like that from the author or
the optimistic parallel replication
is as painful as blasphemy from the mouth of a local priest :-)!)
In fact, my proposal will speed things up, because only one 2-phase
commit
between binlog and engine is needed per transaction. While in current
implementation two are needed, one for XA PREPARE and one for XA
COMMIT.
And a simple sequence like this will be able to group commit together
(on
the slave):
XA PREPARE 't1';
XA COMMIT 't1';
XA PREPARE 't2';
XA COMMIT 't2';
XA PREPARE 't3';
XA COMMIT 't3';
I believe in the current code, it's impossible to group-commit the XA
PREPARE together with the XA COMMIT on a slave?
Of course they can be in one group. That's the part II ready for review
in bb-10.6-MDEV-31949.
- Kristian.
All the best,
Andrei
_______________________________________________
developers mailing list -- developers@lists.mariadb.org
To unsubscribe send an email to developers-le...@lists.mariadb.org