This was a 14-year old safeguard/fallback mechanism when binlog group commit 
was implemented to avoid one fsync-per-commit. Binlog group commit is extremely 
mature now and adding an extra fsync at end of commit only makes things slower; 
besides it is not clear that later InnoDB changes actually still implement the 
intended behavior for the value 3.

So let's make the value 3 work the same way as 1: full durability is 
guaranteed, either by fsync of the redo log during commit (when not using 
two-phase commit with the binlog), or by fsync during prepare (when using 
two-phase commit with the binlog).

Also update and clarify the --help text for the option accordingly.

Signed-off-by: Kristian Nielsen <[email protected]>
---
 storage/innobase/handler/ha_innodb.cc | 21 +++++++++++++--------
 storage/innobase/trx/trx0trx.cc       |  1 +
 2 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/storage/innobase/handler/ha_innodb.cc 
b/storage/innobase/handler/ha_innodb.cc
index 4fda5951155..e8d287d8bfc 100644
--- a/storage/innobase/handler/ha_innodb.cc
+++ b/storage/innobase/handler/ha_innodb.cc
@@ -19026,7 +19026,7 @@ innobase_wsrep_set_checkpoint(
        if (wsrep_is_wsrep_xid(xid)) {
 
                trx_rseg_update_wsrep_checkpoint(xid);
-               log_buffer_flush_to_disk(srv_flush_log_at_trx_commit == 1);
+               log_buffer_flush_to_disk(srv_flush_log_at_trx_commit & 1);
                return 0;
        } else {
                return 1;
@@ -19181,13 +19181,18 @@ static MYSQL_SYSVAR_ULONG(flush_log_at_trx_commit, 
srv_flush_log_at_trx_commit,
   "Controls the durability/speed trade-off for commits."
   " Set to 0 (write and flush redo log to disk only once per second),"
   " 1 (flush to disk at each commit),"
-  " 2 (write to log at commit but flush to disk only once per second)"
-  " or 3 (flush to disk at prepare and at commit, slower and usually 
redundant)."
-  " 1 and 3 guarantees that after a crash, committed transactions will"
-  " not be lost and will be consistent with the binlog and other transactional"
-  " engines. 2 can get inconsistent and lose transactions if there is a"
-  " power failure or kernel crash but not if mysqld crashes. 0 has no"
-  " guarantees in case of crash. 0 and 2 can be faster than 1 or 3",
+  " or 2 (write to log at commit but flush to disk only once per second)."
+  " 1 provides durability (once COMMIT succeeds, transactions will be"
+  " recovered even in case of a crash), but is slowest. 2 preserves commited"
+  " transactions if the mysqld process crashes, but not in case of power"
+  " failure or operating system crash. 0 does not provide durability."
+  " InnoDB table data will be recovered into a consistent state after a crash"
+  " in either case. If --binlog-storage-engine is used, the state will also be"
+  " recovered consistently with the binlog and with other transactional"
+  " storage engines in either case. If --binlog-storage-engine is not used,"
+  " only 1 provides consistency with binlog and other engines after a crash."
+  " 0 and 2 can be faster than 1. The value 3 is allowed for historical"
+  " reasons, but treated the same way as 1",
   NULL, NULL, 1, 0, 3, 0);
 
 static MYSQL_SYSVAR_ENUM(flush_method, innodb_flush_method,
diff --git a/storage/innobase/trx/trx0trx.cc b/storage/innobase/trx/trx0trx.cc
index 98b492fe2ff..4ec441c7086 100644
--- a/storage/innobase/trx/trx0trx.cc
+++ b/storage/innobase/trx/trx0trx.cc
@@ -1757,6 +1757,7 @@ void trx_commit_complete_for_mysql(trx_t *trx)
   case 0:
     return;
   case 1:
+  case 3:
     if (trx->active_commit_ordered && trx->active_prepare)
       return;
   }
-- 
2.39.5

_______________________________________________
commits mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to