Kristian, > [email protected] writes: > >>> 2. To make START TRANSACTION WITH CONSISTENT SNAPSHOT actually correctly >>> synchronise snapshots between multiple storage engines (MySQL does not have >>> this, I think). >> >> (Offtopic, but anyway what it is? Multi-engine transaction with this >> specific ISOLATION level?) > > > https://mariadb.com/kb/en/the-mariadb-library/enhancements-for-start-transaction-with-consistent-snapshot/ > > So it is like a REPEATABLE READ across engines, applications can get a > consistent view of cross-engine transactions. > It also allows to do a non-blocking mysqldump without FLUSH TABLES WITH READ > LOCK. > >>> 3. To avoid having to do an extra fsync() for every commit, on top of the >>> one for prepare and the one for binlog write (MySQL has this). > >>> MySQL handles (3) by stopping all transactions around binlog rotate and >>> doing a flush_logs(). >> >> Maybe I am missing something, but why binlog rotation? It is not a >> common case. Indeed MySQL BGC reduces the number of fsync() to two by >> the (flush stage) leader. As to rotation, it's a specific branch of >> MYSQL_BIN_LOG::ordered_commit() >> where a rotator thread contends for the flush stage mutex, eventually gains >> it >> (which may lead to few more groups binlogged into the old being rotated >> file) and performs. > > It used to be that there was _three_ fsyncs for every commit. The _only_ > reason the fsync in commit was needed was to ensure that binlog crash > recovery would still work after a binlog rotation. Which was kind of > silly.
This one must be https://github.com/mysql/mysql-server/commit/35adf21bb63a336c76efdad6c4610161f3fd733d > > So _something_ needed to be done aroung binlog rotation. To ensure that all > transactions are durably committed in storage engines before they are no > longer available to binlog crash recovery. > > If I understand correctly, MySQL ensures this by temporarily stopping binlog > writes, calling flush_logs() in all (?) engines (with semantics that > flush_logs() must make all prior commit()'s durable), and only then allowing > new writes to the new binlog. I am not sure how MySQL ensures that all > commit() calls complete before the flush_logs() call, maybe it takes both > the LOCK_commit and LOCK_log mutexes around binlog rotation. Almost: LOCK_log and LOCK_xids - I've checked it out, 'cos really forgot it. MySQL employs a sort of unlogging xid former (before BGC) technique. The rotator first (phtread_cond-)waits for all flushed-to-binlog-xids got committed, and then having the two mutex grant ha_flush_logs() is issued right before the new log file is set. > The result is > that binlog crash recovery is always possible from only one binlog file. > > MariaDB instead extends binlog crash recovery to consider multiple binlog > files, if necessary. Then nothing special is needed during binlog rotation. > But some "garbage collection" is needed to eventually release old binlog > files. > >>> MariaDB avoids this stall >> >> You must be having a use case which I can't see.. > > I am not sure one is better than the other. MariaDB avoids flush_logs(), > though in current storage engines it may not matter much. The MySQL approach > is arguably simpler code, though it seems quite an abuse of flush_logs(). It must be only for good to have two well explored methods around. > The MySQL approach was not public when the MariaDB approach was > implemented. True. Thanks a lot for talking and explaining these fine bits! Andrei _______________________________________________ Mailing list: https://launchpad.net/~maria-developers Post to : [email protected] Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp

