Marko Mäkelä <marko.mak...@mariadb.com> writes:

> On Fri, Aug 25, 2023 at 10:16 AM Kristian Nielsen via commits
> <comm...@lists.mariadb.org> wrote:

>> Checksums cannot be pre-computed when binlog encryption is enabled, as

> Have you thought about a format change that would address these issues?

What I though about was how to pre-encrypt without a format change.

Currently, binlog encryption encrypts each event separately using a nonce
built from the binlog file (12 random bytes) and the event (4 bytes of
end_log_pos).

The end_log_pos is easy, we can just put an increasing counter in there, and
then fix the value after decryption.

The problem is the per-binlog-file nonce. If a binlog rotation happens
between transaction start and commit, we will have encrypted with the wrong
nonce. In this case we can decrypt and re-encrypt, assuming most
transactions will not span a binlog rotation. Or we could rotate this part
of the nonce less often (I don't think anything says it _has_ to be for each
binlog rotation).

Binlog encryption is not something that I find very useful or interesting.
It's one of those features that hurt more than it helps. If it was not
there, users would be happy. Now that it's there, they will think they need
to use it, and suffer complexity and limitations. Full-disk encryption seems
much simpler, more robust, and more complete.

But with this patch series, pre-encryption shouldn't be too hard to add. I
think it needs input from whoever thinks binlog encryption is useful, since
their requirements for the nonce will be needed to decide how to implement
it.

> In MDEV-14425 
> https://github.com/MariaDB/server/commit/685d958e38b825ad9829be311f26729cccf37c46
> I redesigned the InnoDB log file format in a way that allows checksums
> to be computed before the final position of the records (the log
> sequence number of LSN) is known. For encryption, an additional 64-bit
> nonce will be written; this currently is the LSN at the time the

In binlog we already have the end_log_pos, which is mostly an unused field.
It's 32 bit, not sure if that would be considered sufficient for a nonce.

> Thanks to MDEV-27774, not only InnoDB computes checksums while not
> holding any mutex, but multiple threads can copy their
> mini-transactions to the shared log_sys.buf while holding a shared

For the binlog, the stmt/trx caches are written directly to the binlog file
under LOCK_log, from a single thread serving multiple pending transactions.
I somehow doubt that this is a current scalability bottleneck, something
that would be interesting to investigate though. It also seems doubtful that
the kernel would scale with multiple threads appending to a file
concurrently. The binlog is a very naive implementation and not well
designed for a scalable transaction log currently. I think the InnoDB redo
log is somewhat different here.

>> +/* Convenience macros since ulong is 32 or 64 bit depending on platform. */
>> +#if SIZEOF_LONG == 4
>> +#define my_atomic_storeul(P, D) my_atomic_store32((int32 volatile *)(P), 
>> (D))
>
> In all currently supported MariaDB Server versions, C++11 is
> available. Is there a particular reason why you would not use
> std::atomic<size_t> here?

Yes. The binlog_checksum_options variable needs to be of ulong type to be
usable with the MYSQL_SYSVAR_ENUM() macro to define the --binlog-checksum
configuration option. As far as I could determine, std::atomic does not
provide any non-atomic access to the underlying storage location to obtain
the ulong * required.

The alternative would be to change the implementation of the
--binlog-checksum sysvar to somehow allow a std::atomic as the backing
store. I thought this was more complex and out of scope of this task, but
don't have a strong opinion one way or the other.

(The ulong type and sys_var was always a source of grief, with ulong being
different size on different platforms and so on).

 - Kristian.
_______________________________________________
developers mailing list -- developers@lists.mariadb.org
To unsubscribe send an email to developers-le...@lists.mariadb.org

Reply via email to