[Labs-l] Filesystem downtime to schedule

Marc A. Pelletier Wed, 31 Dec 2014 09:11:55 -0800

Hello Labs,

Many of you may recall that until some point late 2013, one of thefeatures of the labs file server was that it provided time travelsnapshots (you could see a consistent view of the filesystem as itexisted 1h, 2h, 3h, 1d, 2d, 3d and 1 week ago).

This was disabled at that time - despite being generally consideredvaluable - because it was suspected to be (part of) the stabilityproblems the NFS server suffered at the time. This turns out to nothave been the case, and we could turn it back on now.

Indeed, doing so is a prerequisite to the planned replication of thefilesystem in the new datacenter where a redundant Labs installation isslated to be deployed[1].

The issue is that turning that feature back on requires changing the waythe disk space is currently allocated at a low level[2] and necessitatesa fairly long period of partial downtime during which data is beingcopied from one part of the disk subsystem to the other. In practice,this would require the primary partitions (/home and /data/project) tobe set readonly for a period on the order of a day (24-30 hours).

That downtime is pretty much unavoidable eventually as it is arequirement of expanding labs and improving data resillience andreliability, but the /timing/ of that is flexible. I wanted to "poll"labs users as to when the possibility of disruption is minimized, andgive everyone plenty of time to make contingency planning and/or notifytheir endusers of the expected period of reduced availability.

Provided there is a good consensus that the week is a better time thanthe weekend (I am guessing here that volunteer coders and users are moreactive during the weekend) then I would suggest starting the operationon Tuesday, January 13 at 18:00 UTC. The period of downtime is expectedto last until January 14, 18:00 UTC but may extend a few hours beyond that.


The expected impacts are:

* Starting at the beginning of the window, /home and /data/project willswitch to readonly mode; any attempt to write to files to those treeswill result in EROFS errors being thrown. Reading from thosefilesystems will still work as expected, so would writing to otherfilesystems;* Read performance may degrade noticably as the disk subsystem will beloaded to capacity;* It will not be possible to manipulate the gridengine queue -specifically, starting or stopping jobs will not work; and* At the end of the window, when the operation is complete, the "old"file system will go away and be replaced by the new one - this willcause any access to files or directories that were previously opened(including working directories) on the affected filesystems to error outwith ESTALE. Reopening files by name will access the new copy identicalto the one at the time the filesystems became readonly.

In practice, that latter impact has the effect that most runningprograms will be unable to continue unless they have special handlingfor this situation, and most gridengine jobs will no longer be able tolog output. It may be a good idea to restart any continuous tool atthat time. All webservices that were running at the start of themaintenance window will be restarted at that time.

If you have tools or other processes running that do not rely on beingable to write to /data/project, they may be able to continue runningduring the downtime without interruption. Jobs that only access thenetwork (for instance, the Mediawiki API) or the databases will notlikely be affected. Because of this, no automatic or forcible restartof running (non-webservice) jobs will be made.

In particular, if you have a tool whose continued operation isimportant, temporarily modifying it so that it works from /data/scratchmay be a good workaround.

Finally, in order to avoid risks of the filesystem move taking longerthan expected and increasing downtime significantly, LOG FILES OVER 1GWILL BE NOT BE COPIED. If you have critical files that are not simplelog files but whose names end in .log, .err or .out then you MUSTcompress those files if you absolutely require them to survive thetransition. Alternately, truncating them to some size comfortablysmaller than 1G will work if the file must remain uncompressed.

The speed and reliability of the maintenance process depends on thetotal data to copy. If you can clean up both your home and projectdirectories of extraneous files, you'll help the process greatly. :-)


Thanks all,

-- Marc

_______________________________________________
Labs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/labs-l

[Labs-l] Filesystem downtime to schedule

Reply via email to