This was a 14-year old safeguard/fallback mechanism when binlog group commit was implemented to avoid one fsync-per-commit. Binlog group commit is extremely mature now and adding an extra fsync at end of commit only makes things slower; besides it is not clear that later InnoDB changes actually still implement the intended behavior for the value 3.
So let's make the value 3 work the same way as 1: full durability is guaranteed, either by fsync of the redo log during commit (when not using two-phase commit with the binlog), or by fsync during prepare (when using two-phase commit with the binlog). Also update and clarify the --help text for the option accordingly. Signed-off-by: Kristian Nielsen <[email protected]> --- storage/innobase/handler/ha_innodb.cc | 21 +++++++++++++-------- storage/innobase/trx/trx0trx.cc | 1 + 2 files changed, 14 insertions(+), 8 deletions(-) diff --git a/storage/innobase/handler/ha_innodb.cc b/storage/innobase/handler/ha_innodb.cc index 4fda5951155..e8d287d8bfc 100644 --- a/storage/innobase/handler/ha_innodb.cc +++ b/storage/innobase/handler/ha_innodb.cc @@ -19026,7 +19026,7 @@ innobase_wsrep_set_checkpoint( if (wsrep_is_wsrep_xid(xid)) { trx_rseg_update_wsrep_checkpoint(xid); - log_buffer_flush_to_disk(srv_flush_log_at_trx_commit == 1); + log_buffer_flush_to_disk(srv_flush_log_at_trx_commit & 1); return 0; } else { return 1; @@ -19181,13 +19181,18 @@ static MYSQL_SYSVAR_ULONG(flush_log_at_trx_commit, srv_flush_log_at_trx_commit, "Controls the durability/speed trade-off for commits." " Set to 0 (write and flush redo log to disk only once per second)," " 1 (flush to disk at each commit)," - " 2 (write to log at commit but flush to disk only once per second)" - " or 3 (flush to disk at prepare and at commit, slower and usually redundant)." - " 1 and 3 guarantees that after a crash, committed transactions will" - " not be lost and will be consistent with the binlog and other transactional" - " engines. 2 can get inconsistent and lose transactions if there is a" - " power failure or kernel crash but not if mysqld crashes. 0 has no" - " guarantees in case of crash. 0 and 2 can be faster than 1 or 3", + " or 2 (write to log at commit but flush to disk only once per second)." + " 1 provides durability (once COMMIT succeeds, transactions will be" + " recovered even in case of a crash), but is slowest. 2 preserves commited" + " transactions if the mysqld process crashes, but not in case of power" + " failure or operating system crash. 0 does not provide durability." + " InnoDB table data will be recovered into a consistent state after a crash" + " in either case. If --binlog-storage-engine is used, the state will also be" + " recovered consistently with the binlog and with other transactional" + " storage engines in either case. If --binlog-storage-engine is not used," + " only 1 provides consistency with binlog and other engines after a crash." + " 0 and 2 can be faster than 1. The value 3 is allowed for historical" + " reasons, but treated the same way as 1", NULL, NULL, 1, 0, 3, 0); static MYSQL_SYSVAR_ENUM(flush_method, innodb_flush_method, diff --git a/storage/innobase/trx/trx0trx.cc b/storage/innobase/trx/trx0trx.cc index 98b492fe2ff..4ec441c7086 100644 --- a/storage/innobase/trx/trx0trx.cc +++ b/storage/innobase/trx/trx0trx.cc @@ -1757,6 +1757,7 @@ void trx_commit_complete_for_mysql(trx_t *trx) case 0: return; case 1: + case 3: if (trx->active_commit_ordered && trx->active_prepare) return; } -- 2.39.5 _______________________________________________ commits mailing list -- [email protected] To unsubscribe send an email to [email protected]
