On Sat, Sep 5, 2015 at 8:02 PM, Richard Hipp <drh at sqlite.org> wrote:

> On 9/5/15, Darin Adler <darin at apple.com> wrote:
> > Hi folks.
> >
> > I?m sending this on behalf of Michael Catanzaro, a contributor to the
> WebKit
> > open source project, who is working on a WebKit bug report, "Crash when
> > WebCore::SQLiteFileSystem::openDatabase is called from multiple threads"
> > <https://bugs.webkit.org/show_bug.cgi?id=143245>, which seems to be
> caused
> > by an issue in SQLite. In short, we've noticed many applications that use
> > WebKit crash when sqlite3_initialize is called simultaneously in multiple
> > threads in the Fedora platform
> > <https://bugzilla.redhat.com/show_bug.cgi?id=1201823>
> > <https://bugzilla.redhat.com/show_bug.cgi?id=1217952>
> > <https://bugzilla.redhat.com/show_bug.cgi?id=1228391>
> > <https://bugzilla.redhat.com/show_bug.cgi?id=1207221>  despite the fact
> that
> > sqlite3_initialize is documented to be thread-safe and called
> automatically
> > by the library when needed < https://sqlite.org/c3ref/initialize.html>.
> >
> > Michael is planning a workaround in WebKit that will call
> sqlite3_initialize
> > manually exactly once before WebKit uses sqlite, using std::once to deal
> > with the thread safety issue.
> >
> > We?d like to file an SQLite bug report about this, and as I understand
> it,
> > the proper way for a newcomer to do that is to send mail here.
> >
> > In the process of trying to report this, Michael discovered that the page
> > explaining how to report bugs against SQLite
> > <https://www.sqlite.org/src/wiki?name=Bug+Reports> lists an incorrect
> email
> > address, <sqlite-users at sqlite.org>. Mail to that address is rejected.
> >
>
> Thanks for the bug report.  Code which might fix this has been checked
> into trunk.  I have also corrected the email address on the bug-report
> procedures page.
>

Chromium had a similar issue awhile back at http://crbug.com/248101 , with
the solution of putting sqlite3_initialize() at an appropriate place.
Looks like filing a bug upstream dropped through the cracks, sorry about
that.

If the fix was http://www.sqlite.org/src/info/11a9a786ec06403a , I suspect
it may not be sufficient.  I have built a local Chromium against current
SQLite checked out of fossil (it includes your fix), and when I run the
tests in the above Chromium bug under tsan (*), I still get tsan alerts
about races in sqlite3_initialize() vs itself and sqlite3MallocInit() vs
sqlite3Malloc() (**).  It's possible that tsan is flagging something
distinct from Michael's report, but it seems reasonable to suspect it's all
related.

As best I can tell (and I admit to being rusty on this), the memory barrier
you added prevents re-ordering across that line of code, so that the other
items are all written before xMutexAlloc is written ... for the current
thread on the current CPU core.  But since there is no memory barrier
before the code checking xMutexAlloc, a thread running on a different core
can see xMutexAlloc as being set even though the setup implied by the
memory barrier has not yet been done from the viewpoint of that core.

The same basic logic applies to sqlite3_initialize()'s testing and setting
of sqlite3GlobalConfig.isInit , in a different thread+core the test can see
"true" before that core sees the setup implied by isInit being set to
"true".

[I _think_ xMutexAlloc being 64 bits is probably safe on most current
processors and compilers, but that did have me nervous for a bit.]

I don't think these can be resolved by adding memory barriers, because the
barriers can race each other.  I was able to satisfy at least tsan with
pthread_mutex locks covering sqlite3_initialize() and sqlite3MutexInit()
(plausibly the latter was overkill given the former), but that's obviously
a non-starter given the goal of lazy initialization for
sqlite3_initialize().  Having a per-thread flag in thread-local storage
might be reasonable, assuming your TLS isn't slower than just using a mutex.

Apologies if I'm analyzing this incorrectly.  This kind of thing is often
subtle, and as I mentioned it's possible we're seeing different problems.

-scott

(*) https://www.chromium.org/developers/testing/threadsanitizer-tsan-v2

(**) Apologies for being vague with that - it's a release-mode compile
because otherwise it can take too long to run, and there's some inlining
making it harder to interpret the results.  I might try to repro in a
reduced context where I can relax the optimizations.

Reply via email to