Thank you.
I (and probably many others) would like someone from the Ops team to
elaborate on the uptime and general reliability Labs (especially Tools)
is supposed to have, and for what kind of services it is suitable for,
to prevent future misunderstandings in regards to loss of important
work, etc.
Il 19/02/2015 22:00, Andrew Bogott ha scritto:
It is with a heavy heart that I must share the news of an upcoming
Labs maintenance window.
The labs NFS store (which you probably know as /data/project) is
filling up rapidly and we need to add more drives. By weird
coincidence the actual physical space for that server in the
datacenter is ALSO filling up, so Chris Johnson has graciously agreed
to spend his day re-shuffling servers in order to make space for the
new diskshelf. This involves lots of unplugging and replugging and
amounts to the fact that the NFS server will need to be turned off for
several hours.
During this window Chris will take care of another long-deferred
maintenance task -- he's putting more RAM into the labs puppet master,
virt1000.
What will break:
- Shared storage for all labs and tools instances. That includes
volumes like /data/project, /public/dumps, /data/scratch, /home
- Logins to all instances running ubuntu Precise. (Trusty hosts will
/probably/ still support logins.)
- Login to wikitech and manipulation of instances.
What won't break:
- Labs instances will continue to run
- Tasks running on instances will continue to run; those that don't
rely on shared storage should be fine.
- Web proxies should keep working, if the services they support aren't
relying on shared storage.
What will get better:
- More storage space!
- Fewer problems with dumps filling up NFS (which is basically the
same as 'more storage space'.
- More reliable puppet runs and fewer outages with miscellaneous
OpenStack services (which also run on virt1000)
I apologize in advance for this downtime. Don't hesitate to contact
me or Coren either here or on IRC with advice about how to harden your
tool against this upcoming outage. We will also be available on IRC
during and after the outage to help revive things that are angry about
the timeouts.
-Andrew
_______________________________________________
Labs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/labs-l
_______________________________________________
Labs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/labs-l