As part of our quarterly emphasis on Labs stability and resilience, I've been setting up backup or hot spare systems for many labs services.

Of course, a backup system is only useful if you can switch to it. Tomorrow I will be attempting a switchover from our primary OpenStack controller, virt1000, to a new system, labcontrol1001. Most likely it will go poorly and I will switch back and forth several times. So, during my workday tomorrow (beginning at approximately 14:00 UTC) expect occasional interruptions in some labs services. I'll keep the duration of any of these to a minimum.

What might break:

- Instance creation/deletion [1]
- Various wikitech queries [1]
- Wikitech logins [2]
- Puppet runs on labs instances [3]

What definitely won't break:

- Anything that a toollabs user would notice or care about
- Anything internal to a labs instance (apart from noisy puppet runs)
- Existing wikitech sessions
- Instance network connectivity

I am operating under the assumption that the items in the 'might break' list are pretty much never time-critical for anyone. If that's mistaken then please correct me.

-Andrew



[1] Due to nova services, e.g. the nova-scheduler or nova-conductor
[2] Due to the OpenStack identity service, Keystone
[3] The puppetmaster runs on virt1000 /and/ I'm adding a new service name for puppet, 'labs-puppetmaster-eqiad' so we're no longer coupled to a specific host.

_______________________________________________
Labs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/labs-l

Reply via email to