On Tue, Dec 3, 2024 at 12:40 PM Kazuhiko via discuss <[email protected]> wrote: > > Hello, > > Our application has a sustained rate of DML queries in parallel with > DQL. We can achieve good performance using > innodb-locks-unsafe-for-binlog on MariaDB 10.4, with careful > consideration of its 'unsafeness'. > But innodb-locks-unsafe-for-binlog is removed in MariaDB 10.5 or later > and our application runs much slower without it because of gap locking. > https://mariadb.com/docs/server/ref/mdb/cli/mariadbd/innodb-locks-unsafe-for-binlog/ > > To achieve a good performance as before, we now use READ COMMITTED > transaction isolation for DML and use REPEATABLE READ isolation for DQL > on MariaDB 11.4. This way guarantees the consistency for our > application.
Have you actually verified that your application needs repeatable-read? It's possible but in my experience it is extremely rare for applications to re-read the rows they already read. Not to say it never happens, but I have seen application implementations that actually needs this maybe twice out of thousands of applications I've worked on over the last 25 years. > Our database contains totally 13.7G rows and uses 4.5TB storage. Here is > the configuration : > > innodb_buffer_pool_size = 17179869184 (16GB) > innodb_buffer_pool_instances = 16 (no meaning in 11.4) > innodb_log_file_size = 134217728 (128MB) > innodb_flush_method = O_DIRECT > > Please see the attached graphs drawing 'InnoDB history length (right > axis)' and 'free redo log in % (left axis)' (i.e. 100 - Redo Log > Occupancy) while running our reference workload. > https://mariadb.com/kb/en/innodb-redo-log/#determining-the-redo-log-occupancy > These tests are done with the same workload, but only MariaDB version > and/or configuration is different. > (Please disregard the 'warehouse history length' and 'activities history > length' series in those plots, they are not relevant to the current > issue.) > > On MariaDB 10.4, the performance and the behaviour between 'using > innodb-locks-unsafe-for-binlog' and 'using READ COMMITTED and REPEATABLE > READ' are similar. But on MariaDB 11.4, I found two serious issues. > > Issue 1: InnoDB history length keeps growing while running lots of DML > and DQL queries in parallel. > > InnoDB history length is stable around 5k on 10.4, but it keeps growing > on 11.4 up to 100k until the end. In our experience, increases in > history list length are roughly > correlated with poor query performance (causing queries to "step over > tombstones" ?), and we do our best to avoid having long-running > transactions in order to allow older snapshots to be freed ASAP. We are > satisfied with the status on 10.4, where history list is constant. > What is the reason behind such increases on 11.4 and is there any > configuration to avoid such ? Have you been able to measure the performance impact of this? 100K isn't usually high enough to start causing a measurable performance degradation, that usually starts to happen north of 1M history length. > Issue 2: MariaDB stalls when free redo log is too small. > > When MariaDB stalls, even a quite simple query takes too long. > > (table definition) > CREATE TABLE `catalog` ( > `uid` bigint(20) unsigned NOT NULL, > `path` varchar(255) NOT NULL DEFAULT '', > ... > PRIMARY KEY (`uid`), > KEY `Path` (`path`), > ... > ) > > (from pt-query-digest output) > # Attribute pct total min max avg 95% stddev > median > # ============ === ======= ======= ======= ======= ======= ======= > ======= > # Count 2 91 > # Exec time 2 716s 1s 17s 8s 15s 5s > 7s > # Lock time 0 22ms 156us 491us 238us 366us 70us > 204us > # Rows sent 0 663 3 10 7.29 8.91 1.09 > 6.98 > # Rows examine 0 663 3 10 7.29 8.91 1.09 > 6.98 > # Rows affecte 0 0 0 0 0 0 0 > 0 > # Bytes sent 0 52.10k 330 749 586.24 685.39 68.46 > 563.87 > # Query size 0 470.83k 3.56k 5.26k 5.17k 5.20k 225.56 > 5.20k > SELECT uid, path FROM catalog WHERE path in ('...', /*... omitted 80 > items ...*/)\G > (here, these SELECT are issued in READ COMMITTED isolation level) > > Our understanding is that the stalls are caused by the undo log cleanup > job, and/or the innodb buffer pool writeback job. > Increasing the undo log size a lot seems to reduce the frequency of > those stalls. > For our reference workload here, we still have stalls with 1GB log file > but we don't have stalls with 2GB or more log file. > If we need 'big enough' log file size to avoid stalls, will the ideal > size be proportional to the amount of data or proportional to the buffer > pool size ? > We have never had such stall on MariaDB 10.4 even with small 128MB log > file. How this difference happen and is there any configuration to > improve the behaviour ? This looks like the furious flushing regression that was introduced in 10.5 and doesn't appear to have been fixed since. It is possible to tune it out to a large extent, though: https://shatteredsilicon.net/mariadb-furious-flushing-fix-checkpoint-age/ _______________________________________________ discuss mailing list -- [email protected] To unsubscribe send an email to [email protected]
