On Tue, Feb 3, 2026 at 7:47 PM Gordan Bobic via discuss <[email protected]> wrote: > But I guess if you could quickly block clone everything and > mariabackup is aware of it, then that would minimize the backup window > during which the redo log is at risk of overflowing.
The current circular InnoDB WAL (ib_logfile0) would make the block-clone a little tricky. If we could block all writes to the file for a short time, then I think it could work. In the new innodb_log_archive format, InnoDB would allocate a new log file each time the current one is filling up. When the first checkpoint is written to the new file, the old file will be made read-only, to signal any tools that it is safe to hard-link that file. Also the last file is safe to hard-link at any time, as the log records will never be overwritten. However, a hard link of the last (actively used) file would not be safe to be used for starting up a new server, because we must not allow both the old and new server to write to the same log file. That is why the last log file would have to be copied or block-cloned as a final step of the backup (while not blocking the server). https://jira.mariadb.org/browse/MDEV-37949 mentions a possible new parameter innodb_log_recovery_target, which allows any extra writes in the last log file to be ignored. Alternatively, we could invalidate the tail of the last log file by writing at least one NUL byte at the desired end position. The new server would then start writing from that LSN onwards. > It has been a long time since I looked at btrfs, but I seem to vaguely > recall that it's incrementals still involve reading the entire old and > new files to compute the delta, which is very inefficient, > particularly with databases where updating a single row means having > to re-read the entire tablespace. > ZFS is significantly more advanced than that and only has to read and > send the blocks that have actually changed. Thank you, this is very useful. Your description of incremental btrfs transfer resembles the way how mariadb-backup --backup --incremental currently works: it really reads all *.ibd files to find out which pages have been changed. With the innodb_log_archive format, you would basically only copy the log that was written after the previous (full or incremental) backup finished, and it would cover all changes to InnoDB files. This would be analogous to the incremental ZFS snapshot transfer. However, the binlog, the .frm files and the files of any other storage engines would still have to be handled separately, until and unless an option is implemented to persist everything via a single log. > > I have also been thinking of implementing a live streaming backup in > > the tar format. Perhaps, for performance reasons, there should be an > > option to create multiple streams in parallel. I am yet to experiment > > with this. > > I don't think tar can do that, which is why there is no such thing as > a parallel tar. Above, I was thinking of an option to split the content into multiple tar streams, which could be processed in parallel. > And tar can actually be a serious single-threaded bottleneck when you > are using NVMe drives and 10G+ networking. Can you think of anything that would allow efficient streaming using a single TCP/IP connection in this kind of an environment? > And the only real tunable only shifts it by about 33% (on x86-64 - > other platforms may be different): > https://shatteredsilicon.net/tuning-tar/ > And 33% doesn't really move the needle enough for large fast servers > that run database 10s of terabytes in size. As demonstrated in https://jira.mariadb.org/browse/MDEV-38362, some more performance could be squeezed by using the Linux system calls sendfile(2) or splice(2). Unfortunately, both system calls are limited to copying 65536 bytes at a time. Such offloading is possible with the tar format, because there is no CRC on the data payload, only on the metadata. I fear that we may need multiple streams, which would complicate the interface. The simplest that I can come up with would be to specify the number of streams as well as the name of a script: BACKUP SERVER WITH 8 CLIENT '/path/to/my_script'; The above would reuse existing reserved words. The specified script may make use of a unique parameter (stream number), something like the following: #!/bin/sh zstd|ssh [email protected] "cat>$1.tar.zstd" This kind of a format would allow full flexibility for any further processing. For example, you could extract multiple streams in parallel if you have fast storage: for i in *.tar.zstd; do tar xf - "$i" --zstd -C /data & done Streaming backup is something that I plan to work on after implementing a BACKUP SERVER command that targets a mounted file system. For that, I plan to primarily leverage the Linux copy_file_range(2), which can copy (or block-clone) up to 2 gigabytes per call, falling back to sendfile(2) and ultimately pread(2) and write(2). With best regards, Marko -- Marko Mäkelä, Lead Developer InnoDB MariaDB plc _______________________________________________ discuss mailing list -- [email protected] To unsubscribe send an email to [email protected]
