Hi Otto,

On Wed, Jan 22, 2025 at 3:07 AM Otto Kekäläinen via developers
<developers@lists.mariadb.org> wrote:
> Has anyone attempted or seen any existing implementations "physical"
> replication between two
> MariaDB instances, where a secondary read-only instance would follow
> the primary by
> directly reading and applying page changes from the primary's redo logs?

That makes me think of Amazon Aurora, which to my understanding uses
physical replication, possibly by storing log snippets in a key-value
store, indexed by the tablespace identifier and page number. The
storage underneath the page cache would apparently support fetching
pages as of a specific LSN. Freezing the LSN is what would provide a
consistent view. (Alternatively, a multi-page read mini-transaction
could be retried if one of the pages had been modified during the
execution.)

> The secondary instance would start (and restart if necessary) using a
> full copy of the data from the primary instance, but while running
> keep itself current by reading redo logs from the primary instance,
> and apply pages based on the increasing LSN number. I call this
> "physical" replication to distinguish it from normal logical
> replication based on e.g. binlogs, and this would obviously only work
> for InnoDB.

Theoretically, this could be fairly easily implemented for a read-only
replica of a DML-only workload. As soon as any writes (including the
purge of committed transaction history, or any change buffer merge)
are enabled, the replica would start to produce log on its own and its
LSN would necessarily diverge from the source. Related to this, in
xtrabackup but not in mariadb-backup you will find the option
--apply-log-only. In mariadb-backup we have some debug assertions in
place to ensure that incremental backups will not be broken. The only
situation where mariadb-backup --prepare is writing log records or
modifying any data on its own is when it is being invoked with the
--export option and it needs to clean up the .ibd files that are being
exported.

When it comes to DDL, starting with MySQL 8.0, I would expect all
changes to be made durable via the InnoDB write-ahead log. The
crash-safe DDL starting with MariaDB Server 10.6 works differently,
using a separate file ddl_recovery.log that covers operations on .frm
files, which form part of the data dictionary. Not all DDL operations
are atomic; for example DROP DATABASE is being executed as a sequence
of DROP TABLE operations. I believe that any hot backup or replication
solution when multiple log files are being used will require some
additional locking. If everything was made durable via a single
write-ahead log file, it should be straightforward to replicate just
that one file, without any locking being necessary.

When it comes to binary logs and replication, an improvement is being
worked on. https://jira.mariadb.org/browse/MDEV-34705 aims to
implement an option to write the binlog through the InnoDB
ib_logfile0. That should also make the problematic setting
sync_binlog=0 (https://jira.mariadb.org/browse/MDEV-16589) obsolete.

Marko
-- 
Marko Mäkelä, Lead Developer InnoDB
MariaDB plc
_______________________________________________
developers mailing list -- developers@lists.mariadb.org
To unsubscribe send an email to developers-le...@lists.mariadb.org

Reply via email to