On Tue, Oct 6, 2009 at 5:09 PM, Jeremy Orlow <jor...@chromium.org> wrote:
> On Tue, Oct 6, 2009 at 4:59 PM, John Abd-El-Malek <j...@chromium.org>wrote: > >> >> >> On Tue, Oct 6, 2009 at 4:30 PM, Carlos Pizano <c...@google.com> wrote: >> >>> On Tue, Oct 6, 2009 at 4:14 PM, John Abd-El-Malek <j...@chromium.org> >>> wrote: >>> > I'm not sure how Carlos is doing it? Will we know if something is >>> corrupt >>> > just on load/save? >>> >>> Many sqlite calls can return sqlite_corrupt. For example a query or an >>> insert >>> We just check for error codes 1 to 26 with 5 or 6 of them being >>> serious error such as sqlite_corrupt >>> >>> I am sure that random bit flip in memory and on disk is the cause of >>> some crashes, this is probably the 'limit' factor of how low the crash >>> rate of a perfect program deployed in millions of computers can go. >>> >> >> The point I was trying to make is that the 'limit' factor as you put it is >> proportional to memory usage. Given our large memory consumption in the >> browser process, the numbers from the paper imply dozens of corruptions just >> in sqlite memory per user. Even if only a small fraction of these are >> harmful, spread over millions of users that's a lot of corruption. >> > > For what it's worth: This makes sense to me. It seems like pulling SQLite > into its own process would be helpful for the reasons you laid out. I > wonder if the only reason no one else has chimed in on this thread is that > no one wants to have to implement it. :-) > Chase is going to start investigating it (i.e. figure out what the cost in doing it is, how much change it requires and ways of measuring the benefit). > > >> But I am unsure how to calculate, for example a random bit flip on the >>> backingstores, which add to at least 10M on most machines does not >>> hurt, or in the middle of a cache entry, or in the data part of some >>> structure. >>> >>> >>> >>> I imagine there's no way we can know when corruption >>> > happen in steady-state and the next query leads to some other browser >>> memory >>> > (or another database) getting corrupted? >>> > >>> > On Tue, Oct 6, 2009 at 3:58 PM, Huan Ren <hu...@google.com> wrote: >>> >> >>> >> It will be helpful to get our own measurement on database failures. >>> >> Carlos just added something like that. >>> >> >>> >> Huan >>> >> >>> >> On Tue, Oct 6, 2009 at 3:49 PM, John Abd-El-Malek <j...@chromium.org> >>> >> wrote: >>> >> > Saw this on >>> >> > slashdot: http://www.cs.toronto.edu/~bianca/papers/sigmetrics09.pdf >>> >> > The conclusion is "an average of 25,000–75,000 FIT (failures in time >>> per >>> >> > billion hours of operation) per Mbit". >>> >> > On my machine the browser process is usually > 100MB, so that >>> averages >>> >> > out >>> >> > to 176 to 493 error per year, with those numbers having big variance >>> >> > depending on the machine. Since most users don't have ECC, which >>> means >>> >> > this >>> >> > will lead to corruption. Sqlite is a heavy user of memory, so even >>> if >>> >> > it's >>> >> > 1/4 of the 100MB, that means we'll see an average of 40-120 errors >>> >> > naturally >>> >> > because of faulty DIMMs. >>> >> > Given that sqlite corruption means (repeated) crashing of the >>> browser >>> >> > process, it seems this data heavily suggests we should separate >>> sqlite >>> >> > code >>> >> > into a separate process. The IPC overhead is negligible compared to >>> >> > disk >>> >> > access. My hunch is that the complexity is also not that high, >>> since >>> >> > the >>> >> > code that deals with it is already asynchronous since we don't use >>> >> > sqlite on >>> >> > the UI/IO threads. >>> >> > What do others think? >>> >> > >> > >>> >> > >>> > >>> > >>> >> >> >> >> >> > --~--~---------~--~----~------------~-------~--~----~ Chromium Developers mailing list: chromium-dev@googlegroups.com View archives, change email options, or unsubscribe: http://groups.google.com/group/chromium-dev -~----------~----~----~----~------~----~------~--~---