[MariaDB developers] Re: Update on MDEV-34705 implementing binlog in InnoDB

Kristian Nielsen via developers Wed, 04 Dec 2024 08:08:47 -0800

Markus Mäkelä via developers <developers@lists.mariadb.org> writes:

> On 12/4/24 13:19, Kristian Nielsen via developers wrote:

>> 5. A more controversial thought is to drop support for semi-sync
>> replication. I think many users use semi-sync believing it does something

> As a (kind of) user of semi-sync replication, I believe it has a

Hi Markus, thanks for taking the time to comment! Your input is very
valuable.

> valid, albeit limited, use-case and that it's a necessary component in
> setups where no transactions are allowed to be lost when the primary
> node in a replication cluster goes down.  Perhaps I'm wrong or the way

I would like to be explicit about what it means "no transactions are allowed
to be lost". I know you Markus fully understand what it means, of course.

Transactions can easily be lost if the server crashes up to and during the
commit. What it really means is that the server will send a notification to
the client at some point when a single point of failure will no longer cause
the transaction to be lost. With semi-sync, this notification comes in the
form of the "ok" result of the client's commit.

I want to understand if there are other, possibly better ways to get this
notification, if that is all the relevant applications need?

I was suggesting that the application could itself use MASTER_GTID_WAIT()
against a slave before accepting the commit as "ok" (or a proxy like
MaxScale could do it for the application). Does the current semi-sync
replication do anything more for the application than this, and if so, what?

One benefit of this method is that each commit can decide whether it needs
to wait or not. One commit that "is not allowed to be lost" will not block
other transactions from committing. I think with AFTER_SYNC, all following
transactions will be blocked from committing until the current commit has
been acknowledged by a slave, and that with AFTER_COMMIT they will not be
blocked, but I'm not 100% sure.

> misunderstanding comes from this. The default value of
> rpl_semi_sync_master_wait_point should be AFTER_SYNC (lossless
> failover) and rpl_semi_sync_master_timeout should be set to something

I would like to understand the reason(s) AFTER_SYNC is better than
AFTER_COMMIT.

From my understanding, from the client's narrow perspective about their own
commit there is little difference, either is a notification that the
transaction is now robust to single point of failure (available on at least
two servers).

I know of one usecase, which is when things are set up so that if the master
crashes, failover to a slave is _always_ done, and the crashed master is
changed to be a slave of the new master (as opposed to letting the master
restart, do crash recovery, and continue its operation as a master).

With AFTER_COMMIT, the old master might have a transaction committed that
does not exist on the new master, which will prevent it from working as a
slave and it will need to be discarded (possibly restored from a backup).

With AFTER_SYNC, the old master may still (after restarting) have a
transaction committed to the binlog that is not on the slave / new master.
But the old master can be restarted with --rpl-semi-sync-slave-enabled that
tries to truncate the binlog to discard as many transactions from it as
possible, to make sure it only has transactions that are also present on the
new master.

(Interestingly, this means that the purpose of AFTER_SYNC is to ensure that
transactions _are_ lost, rather than ensure that they are _not_ lost).

Is this the (only) reason that AFTER_SYNC should be default? Or do you know
of other reasons to prefer it?

Now, with the new binlog implementation, there is no longer any AFTER_SYNC.
The whole point of the feature is to make the binlog commit and the InnoDB
commit atomic with each other as a whole, there is no point at which a
transaction is durably committed in the binlog and not committed in InnoDB.
So the truncation of the binlog at old master restart with
--rpl-semi-sync-slave-enabled no longer applies.

But I would argue that this binlog truncation is anyway a misfeature. If we
want to ensure that the master never commits a transaction before it has
been received by a slave, then send the transaction to the slave and await
slave reply _before_ writing it to the binlog. Don't first write it to the
binlog, and then add complex crash recovery code to try and remove it from
the binlog again.

And doing the semi-sync handshake _before_ writing the transaction to the
binlog is something that could be implemented in the new binlog
implementation. It would be something like BEFORE_WRITE, instead of
AFTER_SYNC (which does not exist in the new binlog implementation).

Thus, I really want to understand:

1. Is the --rpl-semi-sync-slave-enabled use case, where a crashing master is
always demoted to a slave, used by users in practice, to warrant
implementing something like BEFORE_WRITE semisync for the new binlog format?

2. Is there another reason that AFTER_SYNC is useful that I should know, and
which needs to be designed into the new binlog format?

 - Kristian.
_______________________________________________
developers mailing list -- developers@lists.mariadb.org
To unsubscribe send an email to developers-le...@lists.mariadb.org

[MariaDB developers] Re: Update on MDEV-34705 implementing binlog in InnoDB

Reply via email to