Xiaofei - We can indeed detect the torn page write without the doublewrite buffer (and WebScaleSQL has a patch utilising this observation). But we need not only to detect, but to recover the page as well. And without the doublewrite, if we discard the page, we have nothing: a half-old half-new page on the disk and the redo log records for that page are not enough to recover it.
2015-05-09 8:44 GMT+03:00 Xiaofei Du <[email protected]>: > Justin, > > I think the fsync I was concerning and the torn page problem are two > different things. But now I have a question about double write buffer. If we > can detect a torn page by checking the top and bottom of a page, why would > we still need double write buffer? If the page is consistent, then we use > it, otherwise, we just discard it. Maybe this is a naive question. But > please let me know. Thanks. > > Xiaofei > > On Fri, May 8, 2015 at 9:24 PM, Justin Swanhart <[email protected]> wrote: >> >> Hi, >> >> The log does not have whole pages. Pages must not be torn for the >> recovery process to work. A fsync is required when a page is written to >> disk. During recovery all changes since the last checkpoint are replayed, >> then transactions that do not have a commit marker are rolled back. This is >> called roll forward/roll back recovery. >> >> --Justin >> >> On Fri, May 8, 2015 at 6:09 PM, Xiaofei Du <[email protected]> >> wrote: >>> >>> Justin, >>> >>> I was thinking of if fsync is needed each time after a write. The >>> operations are already in the log. So recovery can always be done from the >>> log. The difference is that during recovery, we need to go back further in >>> the log and it will take longer. But in that way, I guess it would be hard >>> to coordinate with the kernel flush thread. >>> >>> Xiaofei >>> >>> On Fri, May 8, 2015 at 2:06 PM, Justin Swanhart <[email protected]> >>> wrote: >>>> >>>> Hi, >>>> >>>> InnoDB recovery can not handle torn pages. An fsync is required to >>>> ensure that the page is fully written to disk. This is also why the >>>> doublewrite buffer is used. Before pages are written down to disk, they >>>> are >>>> first written sequentially into the doublewrite buffer. This buffer is >>>> synced, then async page writing can proceed. If the database crashes, the >>>> pages in flight will be rewritten by the doublewrite buffer. The detection >>>> mechanism for torn pages comes from an LSN, which is written into the top >>>> and the bottom of the page. If the LSN at the top and bottom do not match >>>> the page is torn. >>>> >>>> Regards, >>>> >>>> --Justin >>>> >>>> On Fri, May 8, 2015 at 12:43 PM, Xiaofei Du <[email protected]> >>>> wrote: >>>>> >>>>> Laurynas, >>>>> >>>>> This is exactly what I was looking for. I went through these functions >>>>> before. I disabled double write buffer, so I didn't pay attention to code >>>>> under buf_dblwr... The reason I asked this question is because I didn't >>>>> know >>>>> how the recovery process works, so I was wondering if it's necessary to >>>>> fsync after each write. It's a performance concern. Anyway, thank you very >>>>> much! >>>>> >>>>> Jan -- Thank you for your answer too! >>>>> >>>>> Xiaofei >>>>> >>>>> On Thu, May 7, 2015 at 9:59 PM, Laurynas Biveinis >>>>> <[email protected]> wrote: >>>>>> >>>>>> Xiaofei - >>>>>> >>>>>> fsync is performed for all the flush types (LRU, flush, single page) >>>>>> if it is asked for (innodb_flush_method != O_DIRECT_NO_FSYNC). The >>>>>> apparent difference in sync and async is not because of the sync >>>>>> difference itself, but because of the flush type difference. The >>>>>> single page flush flushes one page, and requests a fsync for its file. >>>>>> Other flushes flush in batches, don't have to fsync for each written >>>>>> page individually but rather sync once at the end. Then doublewrite >>>>>> complicates this further. If it is disabled, fsync will happen in >>>>>> buf_dblwr_sync_datafiles called from buf_dblwr_flush_buffered_writes >>>>>> called from buf_flush_common called at the end of either LRU or flush >>>>>> list flush. If doublewrite is enabled, fsync will happen in >>>>>> buf_dblwr_update called from buf_flush_write_complete. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> 2015-05-07 9:01 GMT+03:00 Xiaofei Du <[email protected]>: >>>>>> > Hi Laurynas, >>>>>> > >>>>>> > On Wed, May 6, 2015 at 9:14 PM, Laurynas Biveinis >>>>>> > <[email protected]> wrote: >>>>>> >> >>>>>> >> Xiaofei - >>>>>> >> >>>>>> >> > Does InnoDB maintain a dirty >>>>>> >> > page table? >>>>>> >> >>>>>> >> You must be referring to the buffer pool flush_list. >>>>>> > >>>>>> > >>>>>> > You are right. The flush_list is can be used for recovery and >>>>>> > checkpoint. >>>>>> > >>>>>> >> >>>>>> >> >>>>>> >> > Is fsync called to guarantee the page to be on persistent >>>>>> >> > storage so that the dirty page table can be updated? If this is >>>>>> >> > the >>>>>> >> > case, >>>>>> >> > when is the dirty page table updated for asynchronous IOs? >>>>>> >> >>>>>> >> Check buf_flush_write_complete in buf0flu.cc. For async IO it is >>>>>> >> called from buf_page_io_complete in buf0buf.cc. >>>>>> > >>>>>> > >>>>>> > You are right that this is the place it updates the dirty page >>>>>> > information. >>>>>> > But I still don't understand why the fsync is needed for synchronous >>>>>> > IOs, >>>>>> > but not for the AIOs. Jan Lindstrom said fsync is also called for >>>>>> > other AIO >>>>>> > operations. But I could only it true in one of many AIO operations. >>>>>> > Or maybe >>>>>> > I am missing something still? >>>>>> > >>>>>> >> >>>>>> >> >>>>>> >> -- >>>>>> >> Laurynas >>>>>> > >>>>>> > >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Laurynas >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Mailing list: https://launchpad.net/~maria-discuss >>>>> Post to : [email protected] >>>>> Unsubscribe : https://launchpad.net/~maria-discuss >>>>> More help : https://help.launchpad.net/ListHelp >>>>> >>>> >>> >> > -- Laurynas _______________________________________________ Mailing list: https://launchpad.net/~maria-discuss Post to : [email protected] Unsubscribe : https://launchpad.net/~maria-discuss More help : https://help.launchpad.net/ListHelp

