Hi, Kristian! On Oct 26, Kristian Nielsen wrote: > Currently, when an InnoDB/XtraDB transaction is committed with the > binlog enabled, we do three fsync()'s: > > 1. Inside prepare() in InnoDB > 2. When writing to the binlog > 3. Inside commit() in InnoDB ... > why do we need the fsync() in commit()? > > We do not need it to ensure durability or consistency. If we crash > after commit() returns (or just binlog write finishes), but before the > InnoDB commit reaches disk, the crash recovery at next server start > will re-commit the transaction inside InnoDB. > > In fact, it seems to me the only reason for the third fsync() is that > we call TC_LOG_BINLOG::unlog() after InnoDB commit() returns. And > unlog() may decide to rotate the binlog once it has been called for > all transactions written to the current log file. And during recovery, > we only read the latest binlog, so transactions in older binlogs must > have reached disk for recovery to work. > > Do you agree that this is the only reason the third fsync() is needed?
Yes, sounds logical. > If so, it seems it would not be too hard to avoid that fsync(). Eg. we could > recover from the last two binlog files instead of only one. We would need a > mechanism for InnoDB to tell the binlog that transaction `Xid' reached the > disk, in an asynchronous way (after returning from commit()). Reading two, three, or any number of binlogs is not a solution - it only increases the chance of recovery to work, but does not guarantee that it'll work. For a correct solution we'll need a way to call unlog() asynchronously. Regards, Sergei _______________________________________________ Mailing list: https://launchpad.net/~maria-developers Post to : [email protected] Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp

