If the device and the filesystem provide the guarantees, then yes: http://www.percona.com/doc/percona-server/5.5/performance/atomic_fio.html, but not in the general case.
2015-05-10 9:12 GMT+03:00 Xiaofei Du <[email protected]>: > I came across some slides by Percona CEO. > https://www.percona.com/live/mysql-conference-2015/sites/default/files/slides/PLMCE2015-SSD-For-MySQL.pdf > On page 45, It says "Flash can avoid this with little cost due to internal > design". Does this mean we can disable doublewrite buffer for safe? Thanks. > > Xiaofei > > On Sat, May 9, 2015 at 4:57 PM, Xiaofei Du <[email protected]> wrote: >> >> Laurynas, >> >> We cannot recover from a torn page only using redo log. But wouldn't undo >> log record enough information for recovery in the case of a torn page? Undo >> log should have old values of affected rows. So shouldn't it be enough to >> recover a torn page using information from undo log? >> >> Xiaofei >> >> On Sat, May 9, 2015 at 12:07 AM, Laurynas Biveinis >> <[email protected]> wrote: >>> >>> Xiaofei - >>> >>> We can indeed detect the torn page write without the doublewrite >>> buffer (and WebScaleSQL has a patch utilising this observation). But >>> we need not only to detect, but to recover the page as well. And >>> without the doublewrite, if we discard the page, we have nothing: a >>> half-old half-new page on the disk and the redo log records for that >>> page are not enough to recover it. >>> >>> 2015-05-09 8:44 GMT+03:00 Xiaofei Du <[email protected]>: >>> > Justin, >>> > >>> > I think the fsync I was concerning and the torn page problem are two >>> > different things. But now I have a question about double write buffer. >>> > If we >>> > can detect a torn page by checking the top and bottom of a page, why >>> > would >>> > we still need double write buffer? If the page is consistent, then we >>> > use >>> > it, otherwise, we just discard it. Maybe this is a naive question. But >>> > please let me know. Thanks. >>> > >>> > Xiaofei >>> > >>> > On Fri, May 8, 2015 at 9:24 PM, Justin Swanhart <[email protected]> >>> > wrote: >>> >> >>> >> Hi, >>> >> >>> >> The log does not have whole pages. Pages must not be torn for the >>> >> recovery process to work. A fsync is required when a page is written >>> >> to >>> >> disk. During recovery all changes since the last checkpoint are >>> >> replayed, >>> >> then transactions that do not have a commit marker are rolled back. >>> >> This is >>> >> called roll forward/roll back recovery. >>> >> >>> >> --Justin >>> >> >>> >> On Fri, May 8, 2015 at 6:09 PM, Xiaofei Du <[email protected]> >>> >> wrote: >>> >>> >>> >>> Justin, >>> >>> >>> >>> I was thinking of if fsync is needed each time after a write. The >>> >>> operations are already in the log. So recovery can always be done >>> >>> from the >>> >>> log. The difference is that during recovery, we need to go back >>> >>> further in >>> >>> the log and it will take longer. But in that way, I guess it would be >>> >>> hard >>> >>> to coordinate with the kernel flush thread. >>> >>> >>> >>> Xiaofei >>> >>> >>> >>> On Fri, May 8, 2015 at 2:06 PM, Justin Swanhart <[email protected]> >>> >>> wrote: >>> >>>> >>> >>>> Hi, >>> >>>> >>> >>>> InnoDB recovery can not handle torn pages. An fsync is required to >>> >>>> ensure that the page is fully written to disk. This is also why the >>> >>>> doublewrite buffer is used. Before pages are written down to disk, >>> >>>> they are >>> >>>> first written sequentially into the doublewrite buffer. This buffer >>> >>>> is >>> >>>> synced, then async page writing can proceed. If the database >>> >>>> crashes, the >>> >>>> pages in flight will be rewritten by the doublewrite buffer. The >>> >>>> detection >>> >>>> mechanism for torn pages comes from an LSN, which is written into >>> >>>> the top >>> >>>> and the bottom of the page. If the LSN at the top and bottom do not >>> >>>> match >>> >>>> the page is torn. >>> >>>> >>> >>>> Regards, >>> >>>> >>> >>>> --Justin >>> >>>> >>> >>>> On Fri, May 8, 2015 at 12:43 PM, Xiaofei Du >>> >>>> <[email protected]> >>> >>>> wrote: >>> >>>>> >>> >>>>> Laurynas, >>> >>>>> >>> >>>>> This is exactly what I was looking for. I went through these >>> >>>>> functions >>> >>>>> before. I disabled double write buffer, so I didn't pay attention >>> >>>>> to code >>> >>>>> under buf_dblwr... The reason I asked this question is because I >>> >>>>> didn't know >>> >>>>> how the recovery process works, so I was wondering if it's >>> >>>>> necessary to >>> >>>>> fsync after each write. It's a performance concern. Anyway, thank >>> >>>>> you very >>> >>>>> much! >>> >>>>> >>> >>>>> Jan -- Thank you for your answer too! >>> >>>>> >>> >>>>> Xiaofei >>> >>>>> >>> >>>>> On Thu, May 7, 2015 at 9:59 PM, Laurynas Biveinis >>> >>>>> <[email protected]> wrote: >>> >>>>>> >>> >>>>>> Xiaofei - >>> >>>>>> >>> >>>>>> fsync is performed for all the flush types (LRU, flush, single >>> >>>>>> page) >>> >>>>>> if it is asked for (innodb_flush_method != O_DIRECT_NO_FSYNC). The >>> >>>>>> apparent difference in sync and async is not because of the sync >>> >>>>>> difference itself, but because of the flush type difference. The >>> >>>>>> single page flush flushes one page, and requests a fsync for its >>> >>>>>> file. >>> >>>>>> Other flushes flush in batches, don't have to fsync for each >>> >>>>>> written >>> >>>>>> page individually but rather sync once at the end. Then >>> >>>>>> doublewrite >>> >>>>>> complicates this further. If it is disabled, fsync will happen in >>> >>>>>> buf_dblwr_sync_datafiles called from >>> >>>>>> buf_dblwr_flush_buffered_writes >>> >>>>>> called from buf_flush_common called at the end of either LRU or >>> >>>>>> flush >>> >>>>>> list flush. If doublewrite is enabled, fsync will happen in >>> >>>>>> buf_dblwr_update called from buf_flush_write_complete. >>> >>>>>> >>> >>>>>> >>> >>>>>> >>> >>>>>> >>> >>>>>> 2015-05-07 9:01 GMT+03:00 Xiaofei Du <[email protected]>: >>> >>>>>> > Hi Laurynas, >>> >>>>>> > >>> >>>>>> > On Wed, May 6, 2015 at 9:14 PM, Laurynas Biveinis >>> >>>>>> > <[email protected]> wrote: >>> >>>>>> >> >>> >>>>>> >> Xiaofei - >>> >>>>>> >> >>> >>>>>> >> > Does InnoDB maintain a dirty >>> >>>>>> >> > page table? >>> >>>>>> >> >>> >>>>>> >> You must be referring to the buffer pool flush_list. >>> >>>>>> > >>> >>>>>> > >>> >>>>>> > You are right. The flush_list is can be used for recovery and >>> >>>>>> > checkpoint. >>> >>>>>> > >>> >>>>>> >> >>> >>>>>> >> >>> >>>>>> >> > Is fsync called to guarantee the page to be on persistent >>> >>>>>> >> > storage so that the dirty page table can be updated? If this >>> >>>>>> >> > is >>> >>>>>> >> > the >>> >>>>>> >> > case, >>> >>>>>> >> > when is the dirty page table updated for asynchronous IOs? >>> >>>>>> >> >>> >>>>>> >> Check buf_flush_write_complete in buf0flu.cc. For async IO it >>> >>>>>> >> is >>> >>>>>> >> called from buf_page_io_complete in buf0buf.cc. >>> >>>>>> > >>> >>>>>> > >>> >>>>>> > You are right that this is the place it updates the dirty page >>> >>>>>> > information. >>> >>>>>> > But I still don't understand why the fsync is needed for >>> >>>>>> > synchronous >>> >>>>>> > IOs, >>> >>>>>> > but not for the AIOs. Jan Lindstrom said fsync is also called >>> >>>>>> > for >>> >>>>>> > other AIO >>> >>>>>> > operations. But I could only it true in one of many AIO >>> >>>>>> > operations. >>> >>>>>> > Or maybe >>> >>>>>> > I am missing something still? >>> >>>>>> > >>> >>>>>> >> >>> >>>>>> >> >>> >>>>>> >> -- >>> >>>>>> >> Laurynas >>> >>>>>> > >>> >>>>>> > >>> >>>>>> >>> >>>>>> >>> >>>>>> >>> >>>>>> -- >>> >>>>>> Laurynas >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> _______________________________________________ >>> >>>>> Mailing list: https://launchpad.net/~maria-discuss >>> >>>>> Post to : [email protected] >>> >>>>> Unsubscribe : https://launchpad.net/~maria-discuss >>> >>>>> More help : https://help.launchpad.net/ListHelp >>> >>>>> >>> >>>> >>> >>> >>> >> >>> > >>> >>> >>> >>> -- >>> Laurynas >> >> > -- Laurynas _______________________________________________ Mailing list: https://launchpad.net/~maria-discuss Post to : [email protected] Unsubscribe : https://launchpad.net/~maria-discuss More help : https://help.launchpad.net/ListHelp

