I had a few questions regarding this outages that I wanted to clarify for everyone.
1. There should be no outage during the 5.5 hour outage window for anything pointed to ftp.osuosl.org (unless your DNS is directly pointing at ftp-osl.osuosl.org) 2. During the 18-24hr sync from ceph to local storage, ftp-osl should have normal read/write operations. There might be a little bit of I/O performance hit during that window but it's hard to tell. There will be a short (likely 5 min) outage to read/writes on ftp-osl when I do the final switch back to local storage however. On Thu, Jun 14, 2018 at 10:00 AM, Lance Albertson <la...@osuosl.org> wrote: > Service(s) affected: ftp.osuosl.org > > During the outage, the master syncing node for our FTP cluster (ftp-osl) > will be offline which means any updates to our software mirrors will be > delayed. > > Outage Window: > Start: Mon, Jun 18 9:30AM PDT (Mon Jun 18 1630 UTC) > End: Mon, Jun 18 3:00PM PDT (Mon Jun 18 2200 UTC) > > Reason for outage: > > Our FTP cluster is starting to run low on disk space and we will be adding > additional hard drives to the system. Our system currently has 9.375T of > disk space and we're planning on upgrading it to 18.75T (this takes into > account the RAID6 configuration) > > Unfortunately, due to the nature of the how the disk arrays are > configured, we will not be able to grow the RAID array without a complete > rebuild. This means we're going to have to re-copy all 8.8TB of data off of > the machine and back onto it. Since this task is rather large and time > consuming we've come up with a better alternative so that we don't have our > master FTP server offline for very long. > > We have just recently built a new Ceph cluster for some new storage needs > at the OSL and we are going to temporarily use this cluster to serve the > ftp-osl content. I've already copied the content onto a new volume and have > tested it enough to feel it can handle the load. This should make the > transition plan much easier and quicker than initially.This server is > already out of DNS rotation and we are planning on keeping it out of > rotation until this process is complete to reduce the I/O load. > > So here's the plan thus far starting on Monday: > > 1. Stopping all services on the system and doing one final rsync to the > Ceph volume > 2. Rebooting machine and destroying the current RAID and creating a new > one with the new disks > 3. Reinstall the OS > 4. Bootstrap machine without FTP components initially, setup ceph volume > 5. Deploy FTP components after Ceph volume is setup and ready to go > 6. Ensure inter FTP node syncing is working using the Ceph volume > 7. Sync data from Ceph volume back over to local disks (I'm guessing this > will take 18-24 hours) > 8. Once sync is complete, shutdown all services and switch the mount point > over to the local disks > 9. Profit! > > I would like to thank IBM for donating the hard drives needed for this > upgrade. > > We will plan on doing the storage upgrades on our two other nodes (ftp-nyc > & ftp-chi) soon, however we won't be using the Ceph cluster for this since > they are remote. The current plan is to take one machine out for several > days and sync the data back between the nodes. I will send another outage > announcement for those two nodes once we're ready for that. We still need > to ship the drives to the locations and work with the local data centers to > get them installed. > > Projects affected: Any project using our FTP cluster as a master syncing > point > > -- > Lance Albertson > Director > Oregon State University | Open Source Lab > -- Lance Albertson Director Oregon State University | Open Source Lab
_______________________________________________ Hosting mailing list host...@osuosl.org https://lists.osuosl.org/mailman/listinfo/hosting