Service(s) affected:

​All VMs hosted on the OpenPOWER OpenStack cluster will be offline for
approximately 5 minutes to 2 hours during each VM migration to Ceph. The
outages will only occur when we take a VM down for a migration. All running
VMs should remain online without any issue until we proceed with the
migration.

Outage Window(s):

​Start:   Mon, Apr 9, 10:00AM PDT (Mon Apr 9 1700 UTC)
End:    Mon, Apr 9, 5:00PM PDT (Tue Apr 10 0000 UTC)

I doubt we'll be able to finish the migration in one day so the following
windows are additional as needed:

​Start:   Tue, Apr 10, 9:00AM PDT (Tue Apr 10 1600 UTC)
End:    Tue, Apr 10, 5:00PM PDT (Wed Apr 11 0000 UTC)

​Start:
​Wed
, Apr 1
​1​
, 9:00AM PDT (
​Wed​
Apr 1
​1
1600 UTC)
End:
​Wed​
, Apr 1
​1​
, 5:00PM PDT (
​Thu​
Apr 1
​2
0000 UTC)

​Start:   ​
​Thu
, Apr 1​
​2​
​, 9:00AM PDT (​
​Thu​
​ Apr 1
​2
​ 1600 UTC)
End:    ​
​Thu​
​, Apr 1​
​2​
​, 5:00PM PDT (​
​Fri
​ Apr 1​
​3​
​
0000 UTC)

​Start:   ​​
​Fri
, Apr 1​​
​3​
​​, 9:00AM PDT (​​
​Fri​
Apr 1​
​3​
​ 1600 UTC)
End:    ​​
​Fri​
​​, Apr 1​​
​3​
, 5:00PM PDT (​​
​Sat​
​
​
Apr 1​​
​4​
​
0000 UTC)

Reason for outage:

​We are in the process of ​migrating the storage backend of the cluster
from local storage to using Ceph as a backend. The migration to Ceph should
improve I/O bandwidth and capacity and also provide more flexibility with
doing server maintenance since we can do live migrations on VMs. Thanks to
a donation from IBM, we have a new five node Ceph cluster with 292TB of
capacity including SSD's for journal caching.

​We completed​ the first phase of this migration back in mid-March and now
we're ready for the next phase of the migration. In this next phase, we're
going to switch the OpenStack cluster over to using the new Ceph cluster
for storage. The switch itself should not cause any outages as any running
VMs should remain running on local storage. However any VM that is rebooted
from the OpenStack API will fail to start since it will be expecting a Ceph
volume for the disk. Any VM that is created after the switch will
automatically be deployed on Ceph.

The conversion will require we convert the following OpenStack services
into Ceph:

- VM disks
- Volumes (cinder)
- Image (glance) files

For the vast majority of VMs, the process should be very simple. We will
simply shutdown the VM, copy the disk image over to ceph using qemu-img and
start the VM back up. For the very few VMs that use a cinder volume as a
boot volume, the process is a little more complicated and may take more
time, however it works the same. If your VM has a cinder volume attached to
it, we will migrate both at the same time.

Here is the order in which I'll be doing the migrations:

- VMs with no cinder volumes
- VMs with cinder volumes attached
- VMs with a cinder boot volume

I expect most VM migrations should only last 5-20min however VMs with a lot
of storage may have longer downtimes. If you wish to schedule a specific
time to do a migration, please let me know ASAP. I will be providing a
spreadsheet closer to the migration showing the order of moves I'm planning
an updating in real time as the moves are completed.

If you're at all interested in the specifics of how I'm doing this
migration, you're free to look at this gist [1] I made for myself to keep
track of all the commands. If you have any questions or concerns please let
me know.

Thanks!

[1] https://gist.github.com/ramereth/5e11018570f8cd8aa7e707643a4bbf4b

-- 
Lance Albertson
Director
Oregon State University | Open Source Lab
_______________________________________________
openpower mailing list
openpo...@osuosl.org
https://lists.osuosl.org/mailman/listinfo/openpower

Reply via email to