Hi,

I was wondering what changes SQLite3 would need in light of the fsync
problems discovered by the PostgreSQL community (see "How is it
possible that PostgreSQL used fsync incorrectly for 20 years, and what
we'll do about it" talk [1]).
[2] lists that MySQL and MongoDB did some changes, has SQLite done any?

Luckily SQLite has a comprehensive failure injection test suite, could
this testsuite be updated to simulate the kind of issues discovered by
the PostgreSQL community in [2], i.e. IIUC:
 * some kernels only report errors once per inode, even if multiple
processes access that inode (not guaranteed you get error reported in
same process as the one initiating the write)
 * retrying fsync after a failure always reports success
 * if there is a writeback error you can read back pages older than
what you most recently wrote
 * some kernels/FS may report errors on close but not on fsync

A very conservative interpretation of various fsync bugs in various OS
kernels [2][5] would suggest that:

#1. the list of known OS issues [3] should be updated with an entry
similar to: "Linux kernels <4.13 do not guarantee to report errors
encountered during writeback on next fsync, therefore data corruption
can occur without SQLite knowing about it.", see [4]:
 " If something goes wrong during writeback, it can be hard to report
that error back to user space since the operation that caused that
writeback in the first place will have long since completed. The kernel
makes an attempt to save the error and report it on a subsequent system
call, but it is easy for that information to be lost with the result
that the application is unaware that it has lost data. "

There is not much SQLite3 can do about this: if the OS never reports
the error to the application it won't know about it, as [1] said Linux
<4.13 is doomed.

#2. errors not reported on fsync, but on close

According to [5] "A writeback error may not actually be reported by
fsync(), however; other calls, such as close(), could return it. "
AFAICT sqlite3 only logs errors from close, and doesn't surface them to
the caller.
It is unclear the exact kernel versions that are affected by this, IIUC
from [2] then >= 4.16 would not be affect Postgresql beacuse it always
does an fsync before close.
Does SQLite follow WWPD here, or is it possible that SQLite3 in one
process calls close without fsync, gets the writeback error reported
there (which it only logs), and another process calls fsync and gets
success, wrongly concluding that the data has safely reached the disk?
(because you only get an error from an inode reported at most once)

#3 how does this affect multiple processes accessing same sqlite
database?
If inode errors are reported at most once, could it be that a writeback
error from changes made by process A actually get reported to process
B, and process A never learns about it?
If process A would always call fsync and close before relinquishing the
lock on the journal/WAL I think this wouldn't happen, but does SQLite
guarantee that?

#4 PostgreSQL's long term plans seem to be to use direct I/O

It can take a very long time until all kernels used in production  
eventually get fixed to handle writeback/fsync properly, the talk in
[1] focused on Linux, the situation with other OS is less known.
Given that SQLite is portable across so many different OS, would it
make sense to follow WWPD here and in the long term have SQLite perform
direct I/O on its journals, and possibly data files as well to ensure
it gets proper errors reported?

[Although I would rather see the kernels fixed, instead of each
application having to implement its own page cache, but then updated
applications are easier to deploy than updated kernels.
Perhaps database applications could collaborate on a portable library
on top of direct I/O that has well defined error handling semantics to
avoid the unknowns/bugs in OS kernels?]

[1] https://fosdem.org/2019/schedule/event/postgresql_fsync/
[2] https://wiki.postgresql.org/wiki/Fsync_Errors
[3] https://www.sqlite.org/howtocorrupt.html
[4] https://lwn.net/Articles/724307/
[5] https://lwn.net/Articles/752613/
https://www.postgresql.org/message-id/flat/CAMsr%2BYHh%2B5Oq4xziwwoEfhoTZgr07vdGG%2Bhu%3D1adXx59aTeaoQ%40mail.gmail.com

Best regards,
--Edwin

_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to