Hi,
On 12/5/24 18:02, Kristian Nielsen wrote:
What about the following idea?
1. Implement BEFORE_WRITE semi-sync mode. The master will not write
transactions to the binlog until at least one slave have acknowledged.
2. This means that if the master crashes, when it comes back up it will have
no transaction that does not exists on at least one running node
(assuming at most a single failure at a time).
3. When the master restarts, it will go into read-only mode and wait for
MaxScale (or other management system) to tell it what to do, similar to
MDEV-34878.
4. If MaxScale decides to keep it as the master, it will briefly set it up
as a slave and make sure it has replicated the latest GTID on any slave
in the replication topology. Then it will be set read-write and continue
as the master.
5. If MaxScale decides to promote another server as the new master, the old
master is kept in read-only mode and configured as a slave. The
BEFORE_WRITE ensures the old master will not be ahead of the new master.
This requires the ability in MaxScale to do (4).
I think this will be much more robust than having a crashed server try to
remove transactions already written to the binlog, and having to configure
the server to have one or another role when it starts up.
Instead, all servers in the replication topology always wait at startup for
the manager to replicate any missing transactions from the appropriate
server, and then either set it read-write as a master or continue as a
slave.
What do you think? Of course, this is all for the future, it requires
implementing BEFORE_WRITE in the server first. But I think it sounds
promising.
I think that sounds like a good idea. In step 4, instead of briefly
replicating the lost changes and resuming writes on the same node, I
think MaxScale could just move all writes to the node with the newest
GTID and turn off read-only there, essentially performing a switchover
to another node. I think that it might actually already handle this
case as it can happen with AFTER_SYNC.
However, I'd imagine that this BEFORE_WRITE mode might not be super
useful for manually managed replication. You'd have to always switch
over to another node when a server crashes. All in all, the BEFORE_WRITE
sounds promising and we'd definitely appreciate it but also doesn't seem
super useful outside of this somewhat niche use-case. However I do still
think semi-sync is generally useful and thus this does seem like
something that, as you said, should be implemented eventually in the
binlog-in-engine mode.
I'm looking forward to see more progress updates on this, it all seems
very interesting.
Markus
--
Markus Mäkelä, Senior Software Engineer
MariaDB Corporation
_______________________________________________
developers mailing list -- developers@lists.mariadb.org
To unsubscribe send an email to developers-le...@lists.mariadb.org