Re: [Launchpad-dev] Using a signal to switch to read-only mode

Guilherme Salgado Wed, 06 Jan 2010 05:21:03 -0800

On Tue, 2010-01-05 at 21:12 +0000, Tom Haddon wrote:
> On Tue, 2010-01-05 at 18:54 -0200, Guilherme Salgado wrote:
> > (CCing launchpad-dev as others might have ideas/suggestions)
> > 
> > On Tue, 2010-01-05 at 08:13 +0000, Tom Haddon wrote:
> > > On Mon, 2010-01-04 at 18:16 -0200, Guilherme Salgado wrote:
> > > > On Mon, 2009-12-21 at 09:21 +0000, Tom Haddon wrote:
> > > > > On Fri, 2009-12-18 at 10:37 -0500, Gary Poster wrote:
> > > > > > I like the suggestions I've read.  Thanks to all three of you.  I'll
> > > > > > summarize the proposals so far.
> > > > > > 
> > > > > > - We will switch logrotation to use SIGHUP.
> > > > > > 
> > > > > > - We will use SIGUSR2 as a flag for checking for the presence of a
> > > > > > "read-only.txt" at the top of the tree.
> > > > > > 
> > > > > > - At application start, or when SIGUSR2 fires, if "read-only.txt" is
> > > > > > found at the top of the tree, the application will switch to (or 
> > > > > > stay
> > > > > > in) read-only mode.  If it is not found, the application will switch
> > > > > > to (or stay in) normal read-write mode.
> > > > > > 
> > > > > > - We will provide a key-value page to verify the read-only status of
> > > > > > (each) application.
> > > > > > 
> > > > > > Here are my thoughts:
> > > > > > 
> > > > > > - I think the key-value page would be very valuable for LOSA peace 
> > > > > > of
> > > > > > mind, so I like the idea.  However, it is only pertinent for a given
> > > > > > application instance.  Going to this page through the load-balancer
> > > > > > would not be valuable.    LOSAs, would you immediately use this page
> > > > > > if we offered it, going to each instance in the cluster? 
> > > > > 
> > > > > It'd be nice, but I don't want to block on it.
> > > > > 
> > > > > >  If not, I'd like to push it out of the scope of this effort, until 
> > > > > > we
> > > > > > can think about offering an aggregated view of information like this
> > > > > > in a dashboard like the one Maris will hopefully be working on this
> > > > > > cycle.
> > > > > > 
> > > > > > - I think we should definitely log mode switches.  Then LOSAs can at
> > > > > > least trail the logs for a given instance to verify that the app
> > > > > > noticed the signal and the presence or absence of the file.
> > > > > 
> > > > > +1
> > > > > > 
> > > > > > - If the LOSAs don't want to rock the boat with changing logrotation
> > > > > > to SIGHUP, we do have a swath of signals from SIGRTMIN to SIGRTMAX
> > > > > > that we could use.  I'm in favor of the SIGHUP switch if the LOSAs
> > > > > > don't mind, though.
> > > > > > 
> > > > > This switch is okay.
> > > > > 
> > > > 
> > > > Today I started working on this, and following is my initial plan:
> > > > 
> > > >         Currently, the way we switch to read-only is by changing the
> > > >         read_only config to True *and* changing the main_master and
> > > >         main_slave configs to point to standalone databases. What we
> > > >         want is to get rid of the read_only config and collapse the
> > > >         extra config files we have for read-only mode (lpnet1-db-update)
> > > >         into the lpnet1 config.
> > > >         
> > > >         In order to do this we will use the presence of a file
> > > >         (read-only.txt) on the root of the tree to identify (upon
> > > >         startup or SIGUSR2) whether or not we're in read-only mode, and
> > > >         set the main_master and main_slave configs appropriately.  As
> > > >         we'll be overwriting these config variables, we'll need to store
> > > >         all different values we might use for them in new variables
> > > >         (e.g.  rw_main_master, rw_main_slave, ro_main_master and
> > > >         ro_main_slave).  (we might even get rid of the main_master and
> > > >         main_slave config variables as they will be computed values,
> > > >         which can be moved somewhere else.  although I'm not sure this
> > > >         is a good idea because all other db names live in config
> > > >         variables). 
> > > >         
> > > >         The plan:
> > > >         
> > > >         • Change all places that use config.launchpad.read_only to use
> > > >           another helper, which tells whether or not we're in read-only
> > > >           mode by looking for a read-only.txt file.
> > > >         • switch logrotation to use SIGHUP.
> > > >         • Rename main_master and main_slave to rw_main_master and
> > > >           rw_main_slave, adding new (and empty) main_master and 
> > > > main_slave
> > > >           config variables, which get set upon startup/SIGUSR2 (with the
> > > >           values of rw_*).
> > > >         • log read-only/read-write switches 
> > > > 
> > > > However, after I started implementing it I realized that having two
> > > > switches (the read-only.txt file and the SIGUSR2) to turn on read-only
> > > > doesn't sound like a very good idea (as we may accidentally leave an app
> > > > server in an inconsistent state), so we may want to use SIGUSR2 to
> > > > create a read-only.txt file *and* trigger the code that sets the configs
> > > > with the appropriate values. 
> > > 
> > > You don't need to worry about creating/deleting the read-only.txt file -
> > > we'll manage that through external means (initscripts or other helper
> > > scripts). I'd envisage you only need one signal which means "check again
> > > whether we're in read-only or read-write mode". 
> > > 
> > 
> > As we discussed on IRC, my concern was that having a read-only.txt file
> > did not mean we were in read-only mode -- the SIGUSR2 is needed, and if
> > forgotten the server would be in an inconsistent state.  In that state,
> > the python code thinks we're running in read only (because it relies on
> > read-only.txt for that) but we're still connecting to the rw db (because
> > we rely on SIGUSR2 to change to the ro dbs).
> > 
> > Anyway, that didn't seem to be a big deal as this is going to be handled
> > by scripts, so I went ahead and tried to implement that.  As usual, I've
> > encountered some problems, and they seem to boil down to the way our
> > config works -- the config variables are immutable so to make changes we
> > need to push/pop overlays on top of the existing config.
> > 
> > Since config.pop(name) removes the overlay with the given name and any
> > others that were on top of it, we can't rely on config.push/pop to
> > update the config values because we might end up inadvertently reverting
> > others' changes and others might do the same to ours.  I think this
> > push/pop mechanism was meant only for testing purposes.
> > 
> > After realizing that I came up with another approach, which relies only
> > on the presence/absence of the read-only.txt file to figure out the mode
> > we're on.  On this approach, config.database.main_master/slave are gone
> > and we use dbconfig.main_master/slave instead, which are properties in
> > DatabaseConfig that return the appropriate value according to the mode
> > we're on.
> 
> Does this mean we're checking for the presence of this text file before
> every database operation? That sounds quite IO intensive.


ISTM that the presence of the file would be checked only a couple times
(once for each of the properties in DatabaseConfig that look for that
file) for each handler thread, as a consequence of storm creating the DB
connections when they're first used.

If that's correct, then we'll have to find a way to reset the stores in
all threads when we switch modes -- something I didn't realize before.

> 
> > Although that simplifies things for us and for LOSAs, it also means we
> > can't easily log mode switches (because we don't have the signal
> > anymore). 
> 
> Surely the server knows a current state, and then if that changes you
> could log it?

Not in the current implementation, as it relies on a @property which
checks the presence of read-only.txt, but it's easy to change that. Not
sure what I had in mind when I wrote the above.

> 
> >  We could easily workaround that by pushing config changes,
> > but I'd be very uncomfortable doing that, for the reasons I explained
> > above.
> > 
> > So, I'd like to know if this would be an acceptable solution, and
> > whether or not we can live without logs of the mode switches?
> > 
> > > That make sense?
> > > 
> > > >  Similarly, when starting up we'd check for
> > > > the presence of read-only.txt and set the config variables with the
> > > > appropriate values.  That means we can't use SIGUSR2 to switch back to
> > > > read-write mode, though.
> > > > 
> > > > An alternative that would not have any of the problems described above
> > > > would be to keep the existing code using config.launchpad.read_only and
> > > > have the helper function (which looks for read-only.txt) just update
> > > > that config variable upon startup/SIGUSR.  That way it'd be much harder
> > > > to have an appserver in read-only mode using the wrong DB, and we'd be
> > > > able to use SIGURS2 to switch back to read-write mode.
> > > > 
> > > > Any preferences/suggestions?
> > > > 
> > > 
> > > 
> > 
> > 
> 
> 


-- 
Guilherme Salgado <[email protected]>

signature.asc
Description: This is a digitally signed message part

_______________________________________________
Mailing list: https://launchpad.net/~launchpad-dev
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~launchpad-dev
More help   : https://help.launchpad.net/ListHelp

Re: [Launchpad-dev] Using a signal to switch to read-only mode

Reply via email to