Before any roll out that involves changes to the database schema, we switch Launchpad into read only mode and point it to a read-only replica of our DB while we do the roll out. Switching to read only should be just a matter of placing a file named read-only.txt under the root of our tree and waiting a couple minutes for all threads to finish processing their requests and notice the mode switch, but that never worked as expected[1].
This works by having the DB URI as a config variable that returns the appropriate value according to the presence of the read-only.txt file, together with a publication hook that forces the storm stores to reconnect whenever there's a mode switch. However, storm (through ZStorm) shares storm.database.Database instances across threads (without saying so in the docs) and our custom version of that class (LaunchpadDatabase) may change its DB URI (when there's a config change) after instantiation[2]. This, in itself, is a problem, but since we don't do runtime config changes except when running the test suite (which is single-threaded), it's never affected us. My changes for the read-only switch, though, introduced a way of changing config values at runtime, thus exposing the problem. This means there's a race condition when switching to read-only mode, when a thread kicks in after another thread has noticed the config change and reconnected its stores, thus causing the shared instance of LaunchpadDatabase to change its DB URI. When other threads kick in, they'll think they're connected to the correct DB (remember we use the DB URI stored in LaunchpadDatabase for that) and won't tell their stores to reconnect, leaving open connections to DBs that should have no connections. I think we should try and find out why storm shares these instances across threads and if it's not really intentional/needed, we should fix it. If it's needed, they should at least note in the class' docs that it's shared across threads, as that class is expected to be customized in applications. In the meantime, we can easily fix this by not using the DB URI stored in LaunchpadDatabase to detect mode switches, using the class of the connection's instance instead -- when in read-only mode we use a ReadOnlyModeConnection whereas in read-write we use the regular PostgresConnection (provided by storm). [1] https://bugs.edge.launchpad.net/launchpad-foundations/+bug/531834 [2] IIRC, this was done to support LaunchpadZopelessLayer.switchDBUser() -- Guilherme Salgado <https://launchpad.net/~salgado>
signature.asc
Description: This is a digitally signed message part
_______________________________________________ Mailing list: https://launchpad.net/~launchpad-dev Post to : [email protected] Unsubscribe : https://launchpad.net/~launchpad-dev More help : https://help.launchpad.net/ListHelp

