Undo logs log only a subset of a database instance. And, since their purpose is different, by the time of crash recovery the undo logs might be purged.
2015-05-10 2:57 GMT+03:00 Xiaofei Du <[email protected]>: > Laurynas, > > We cannot recover from a torn page only using redo log. But wouldn't undo > log record enough information for recovery in the case of a torn page? Undo > log should have old values of affected rows. So shouldn't it be enough to > recover a torn page using information from undo log? > > Xiaofei > > On Sat, May 9, 2015 at 12:07 AM, Laurynas Biveinis > <[email protected]> wrote: >> >> Xiaofei - >> >> We can indeed detect the torn page write without the doublewrite >> buffer (and WebScaleSQL has a patch utilising this observation). But >> we need not only to detect, but to recover the page as well. And >> without the doublewrite, if we discard the page, we have nothing: a >> half-old half-new page on the disk and the redo log records for that >> page are not enough to recover it. >> >> 2015-05-09 8:44 GMT+03:00 Xiaofei Du <[email protected]>: >> > Justin, >> > >> > I think the fsync I was concerning and the torn page problem are two >> > different things. But now I have a question about double write buffer. >> > If we >> > can detect a torn page by checking the top and bottom of a page, why >> > would >> > we still need double write buffer? If the page is consistent, then we >> > use >> > it, otherwise, we just discard it. Maybe this is a naive question. But >> > please let me know. Thanks. >> > >> > Xiaofei >> > >> > On Fri, May 8, 2015 at 9:24 PM, Justin Swanhart <[email protected]> >> > wrote: >> >> >> >> Hi, >> >> >> >> The log does not have whole pages. Pages must not be torn for the >> >> recovery process to work. A fsync is required when a page is written >> >> to >> >> disk. During recovery all changes since the last checkpoint are >> >> replayed, >> >> then transactions that do not have a commit marker are rolled back. >> >> This is >> >> called roll forward/roll back recovery. >> >> >> >> --Justin >> >> >> >> On Fri, May 8, 2015 at 6:09 PM, Xiaofei Du <[email protected]> >> >> wrote: >> >>> >> >>> Justin, >> >>> >> >>> I was thinking of if fsync is needed each time after a write. The >> >>> operations are already in the log. So recovery can always be done from >> >>> the >> >>> log. The difference is that during recovery, we need to go back >> >>> further in >> >>> the log and it will take longer. But in that way, I guess it would be >> >>> hard >> >>> to coordinate with the kernel flush thread. >> >>> >> >>> Xiaofei >> >>> >> >>> On Fri, May 8, 2015 at 2:06 PM, Justin Swanhart <[email protected]> >> >>> wrote: >> >>>> >> >>>> Hi, >> >>>> >> >>>> InnoDB recovery can not handle torn pages. An fsync is required to >> >>>> ensure that the page is fully written to disk. This is also why the >> >>>> doublewrite buffer is used. Before pages are written down to disk, >> >>>> they are >> >>>> first written sequentially into the doublewrite buffer. This buffer >> >>>> is >> >>>> synced, then async page writing can proceed. If the database >> >>>> crashes, the >> >>>> pages in flight will be rewritten by the doublewrite buffer. The >> >>>> detection >> >>>> mechanism for torn pages comes from an LSN, which is written into the >> >>>> top >> >>>> and the bottom of the page. If the LSN at the top and bottom do not >> >>>> match >> >>>> the page is torn. >> >>>> >> >>>> Regards, >> >>>> >> >>>> --Justin >> >>>> >> >>>> On Fri, May 8, 2015 at 12:43 PM, Xiaofei Du <[email protected]> >> >>>> wrote: >> >>>>> >> >>>>> Laurynas, >> >>>>> >> >>>>> This is exactly what I was looking for. I went through these >> >>>>> functions >> >>>>> before. I disabled double write buffer, so I didn't pay attention to >> >>>>> code >> >>>>> under buf_dblwr... The reason I asked this question is because I >> >>>>> didn't know >> >>>>> how the recovery process works, so I was wondering if it's necessary >> >>>>> to >> >>>>> fsync after each write. It's a performance concern. Anyway, thank >> >>>>> you very >> >>>>> much! >> >>>>> >> >>>>> Jan -- Thank you for your answer too! >> >>>>> >> >>>>> Xiaofei >> >>>>> >> >>>>> On Thu, May 7, 2015 at 9:59 PM, Laurynas Biveinis >> >>>>> <[email protected]> wrote: >> >>>>>> >> >>>>>> Xiaofei - >> >>>>>> >> >>>>>> fsync is performed for all the flush types (LRU, flush, single >> >>>>>> page) >> >>>>>> if it is asked for (innodb_flush_method != O_DIRECT_NO_FSYNC). The >> >>>>>> apparent difference in sync and async is not because of the sync >> >>>>>> difference itself, but because of the flush type difference. The >> >>>>>> single page flush flushes one page, and requests a fsync for its >> >>>>>> file. >> >>>>>> Other flushes flush in batches, don't have to fsync for each >> >>>>>> written >> >>>>>> page individually but rather sync once at the end. Then doublewrite >> >>>>>> complicates this further. If it is disabled, fsync will happen in >> >>>>>> buf_dblwr_sync_datafiles called from >> >>>>>> buf_dblwr_flush_buffered_writes >> >>>>>> called from buf_flush_common called at the end of either LRU or >> >>>>>> flush >> >>>>>> list flush. If doublewrite is enabled, fsync will happen in >> >>>>>> buf_dblwr_update called from buf_flush_write_complete. >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> 2015-05-07 9:01 GMT+03:00 Xiaofei Du <[email protected]>: >> >>>>>> > Hi Laurynas, >> >>>>>> > >> >>>>>> > On Wed, May 6, 2015 at 9:14 PM, Laurynas Biveinis >> >>>>>> > <[email protected]> wrote: >> >>>>>> >> >> >>>>>> >> Xiaofei - >> >>>>>> >> >> >>>>>> >> > Does InnoDB maintain a dirty >> >>>>>> >> > page table? >> >>>>>> >> >> >>>>>> >> You must be referring to the buffer pool flush_list. >> >>>>>> > >> >>>>>> > >> >>>>>> > You are right. The flush_list is can be used for recovery and >> >>>>>> > checkpoint. >> >>>>>> > >> >>>>>> >> >> >>>>>> >> >> >>>>>> >> > Is fsync called to guarantee the page to be on persistent >> >>>>>> >> > storage so that the dirty page table can be updated? If this >> >>>>>> >> > is >> >>>>>> >> > the >> >>>>>> >> > case, >> >>>>>> >> > when is the dirty page table updated for asynchronous IOs? >> >>>>>> >> >> >>>>>> >> Check buf_flush_write_complete in buf0flu.cc. For async IO it is >> >>>>>> >> called from buf_page_io_complete in buf0buf.cc. >> >>>>>> > >> >>>>>> > >> >>>>>> > You are right that this is the place it updates the dirty page >> >>>>>> > information. >> >>>>>> > But I still don't understand why the fsync is needed for >> >>>>>> > synchronous >> >>>>>> > IOs, >> >>>>>> > but not for the AIOs. Jan Lindstrom said fsync is also called for >> >>>>>> > other AIO >> >>>>>> > operations. But I could only it true in one of many AIO >> >>>>>> > operations. >> >>>>>> > Or maybe >> >>>>>> > I am missing something still? >> >>>>>> > >> >>>>>> >> >> >>>>>> >> >> >>>>>> >> -- >> >>>>>> >> Laurynas >> >>>>>> > >> >>>>>> > >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> -- >> >>>>>> Laurynas >> >>>>> >> >>>>> >> >>>>> >> >>>>> _______________________________________________ >> >>>>> Mailing list: https://launchpad.net/~maria-discuss >> >>>>> Post to : [email protected] >> >>>>> Unsubscribe : https://launchpad.net/~maria-discuss >> >>>>> More help : https://help.launchpad.net/ListHelp >> >>>>> >> >>>> >> >>> >> >> >> > >> >> >> >> -- >> Laurynas > > -- Laurynas _______________________________________________ Mailing list: https://launchpad.net/~maria-discuss Post to : [email protected] Unsubscribe : https://launchpad.net/~maria-discuss More help : https://help.launchpad.net/ListHelp

