Service(s) affected: All VMs hosted on the OpenPOWER OpenStack cluster will be offline for approximately 5 minutes to 2 hours during each VM migration to Ceph. The outages will only occur when we take a VM down for a migration. All running VMs should remain online without any issue until we proceed with the migration.
Outage Window(s): Start: Mon, Apr 9, 10:00AM PDT (Mon Apr 9 1700 UTC) End: Mon, Apr 9, 5:00PM PDT (Tue Apr 10 0000 UTC) I doubt we'll be able to finish the migration in one day so the following windows are additional as needed: Start: Tue, Apr 10, 9:00AM PDT (Tue Apr 10 1600 UTC) End: Tue, Apr 10, 5:00PM PDT (Wed Apr 11 0000 UTC) Start: Wed , Apr 1 1 , 9:00AM PDT ( Wed Apr 1 1 1600 UTC) End: Wed , Apr 1 1 , 5:00PM PDT ( Thu Apr 1 2 0000 UTC) Start: Thu , Apr 1 2 , 9:00AM PDT ( Thu Apr 1 2 1600 UTC) End: Thu , Apr 1 2 , 5:00PM PDT ( Fri Apr 1 3 0000 UTC) Start: Fri , Apr 1 3 , 9:00AM PDT ( Fri Apr 1 3 1600 UTC) End: Fri , Apr 1 3 , 5:00PM PDT ( Sat Apr 1 4 0000 UTC) Reason for outage: We are in the process of migrating the storage backend of the cluster from local storage to using Ceph as a backend. The migration to Ceph should improve I/O bandwidth and capacity and also provide more flexibility with doing server maintenance since we can do live migrations on VMs. Thanks to a donation from IBM, we have a new five node Ceph cluster with 292TB of capacity including SSD's for journal caching. We completed the first phase of this migration back in mid-March and now we're ready for the next phase of the migration. In this next phase, we're going to switch the OpenStack cluster over to using the new Ceph cluster for storage. The switch itself should not cause any outages as any running VMs should remain running on local storage. However any VM that is rebooted from the OpenStack API will fail to start since it will be expecting a Ceph volume for the disk. Any VM that is created after the switch will automatically be deployed on Ceph. The conversion will require we convert the following OpenStack services into Ceph: - VM disks - Volumes (cinder) - Image (glance) files For the vast majority of VMs, the process should be very simple. We will simply shutdown the VM, copy the disk image over to ceph using qemu-img and start the VM back up. For the very few VMs that use a cinder volume as a boot volume, the process is a little more complicated and may take more time, however it works the same. If your VM has a cinder volume attached to it, we will migrate both at the same time. Here is the order in which I'll be doing the migrations: - VMs with no cinder volumes - VMs with cinder volumes attached - VMs with a cinder boot volume I expect most VM migrations should only last 5-20min however VMs with a lot of storage may have longer downtimes. If you wish to schedule a specific time to do a migration, please let me know ASAP. I will be providing a spreadsheet closer to the migration showing the order of moves I'm planning an updating in real time as the moves are completed. If you're at all interested in the specifics of how I'm doing this migration, you're free to look at this gist [1] I made for myself to keep track of all the commands. If you have any questions or concerns please let me know. Thanks! [1] https://gist.github.com/ramereth/5e11018570f8cd8aa7e707643a4bbf4b -- Lance Albertson Director Oregon State University | Open Source Lab
_______________________________________________ openpower mailing list openpo...@osuosl.org https://lists.osuosl.org/mailman/listinfo/openpower