Hi, I was wondering what changes SQLite3 would need in light of the fsync problems discovered by the PostgreSQL community (see "How is it possible that PostgreSQL used fsync incorrectly for 20 years, and what we'll do about it" talk [1]). [2] lists that MySQL and MongoDB did some changes, has SQLite done any?
Luckily SQLite has a comprehensive failure injection test suite, could this testsuite be updated to simulate the kind of issues discovered by the PostgreSQL community in [2], i.e. IIUC: * some kernels only report errors once per inode, even if multiple processes access that inode (not guaranteed you get error reported in same process as the one initiating the write) * retrying fsync after a failure always reports success * if there is a writeback error you can read back pages older than what you most recently wrote * some kernels/FS may report errors on close but not on fsync A very conservative interpretation of various fsync bugs in various OS kernels [2][5] would suggest that: #1. the list of known OS issues [3] should be updated with an entry similar to: "Linux kernels <4.13 do not guarantee to report errors encountered during writeback on next fsync, therefore data corruption can occur without SQLite knowing about it.", see [4]: " If something goes wrong during writeback, it can be hard to report that error back to user space since the operation that caused that writeback in the first place will have long since completed. The kernel makes an attempt to save the error and report it on a subsequent system call, but it is easy for that information to be lost with the result that the application is unaware that it has lost data. " There is not much SQLite3 can do about this: if the OS never reports the error to the application it won't know about it, as [1] said Linux <4.13 is doomed. #2. errors not reported on fsync, but on close According to [5] "A writeback error may not actually be reported by fsync(), however; other calls, such as close(), could return it. " AFAICT sqlite3 only logs errors from close, and doesn't surface them to the caller. It is unclear the exact kernel versions that are affected by this, IIUC from [2] then >= 4.16 would not be affect Postgresql beacuse it always does an fsync before close. Does SQLite follow WWPD here, or is it possible that SQLite3 in one process calls close without fsync, gets the writeback error reported there (which it only logs), and another process calls fsync and gets success, wrongly concluding that the data has safely reached the disk? (because you only get an error from an inode reported at most once) #3 how does this affect multiple processes accessing same sqlite database? If inode errors are reported at most once, could it be that a writeback error from changes made by process A actually get reported to process B, and process A never learns about it? If process A would always call fsync and close before relinquishing the lock on the journal/WAL I think this wouldn't happen, but does SQLite guarantee that? #4 PostgreSQL's long term plans seem to be to use direct I/O It can take a very long time until all kernels used in production eventually get fixed to handle writeback/fsync properly, the talk in [1] focused on Linux, the situation with other OS is less known. Given that SQLite is portable across so many different OS, would it make sense to follow WWPD here and in the long term have SQLite perform direct I/O on its journals, and possibly data files as well to ensure it gets proper errors reported? [Although I would rather see the kernels fixed, instead of each application having to implement its own page cache, but then updated applications are easier to deploy than updated kernels. Perhaps database applications could collaborate on a portable library on top of direct I/O that has well defined error handling semantics to avoid the unknowns/bugs in OS kernels?] [1] https://fosdem.org/2019/schedule/event/postgresql_fsync/ [2] https://wiki.postgresql.org/wiki/Fsync_Errors [3] https://www.sqlite.org/howtocorrupt.html [4] https://lwn.net/Articles/724307/ [5] https://lwn.net/Articles/752613/ https://www.postgresql.org/message-id/flat/CAMsr%2BYHh%2B5Oq4xziwwoEfhoTZgr07vdGG%2Bhu%3D1adXx59aTeaoQ%40mail.gmail.com Best regards, --Edwin _______________________________________________ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users