On 10/7/14 6:50 PM, John wrote:
Any details on what parts of toolslab went down? Ie services running
on that virt?
I can tell you which tools instances were on virt1005:
| 120cc401-ed7a-44c5-b905-2d0eae23b6af | tools-exec-03
| 30b98f1d-1c5a-49c1-b800-f4c535addc12 | tools-exec-07
| 5cd684db-d0a6-4241-a11f-daf4c1b2f717 | tools-exec-09
| 523df61c-07f0-41ba-924d-e2b8e474b4d7 | tools-exec-cyberbot
| 96c37c36-970b-4cc7-a7ba-d1ee90a225b5 | tools-submit
| cdce426b-ef6f-47e7-96e4-bcb3647f4709 | tools-webgrid-04
| 79aeb31c-a1c1-41af-9e00-df2c7e248924 | tools-webgrid-tomcat
| 8d92c507-d253-425d-b7f4-2af3678a39ae | tools-webproxy
| 22d32e6e-608c-48a8-8423-2a1ff69fad4d | toolsbeta-exec-01
| 31e8206d-fa5c-4e62-a805-8cfb7def1f46 | toolsbeta-puppetmaster3
| 4f223286-49e0-4526-8a4e-8b64c132422a | toolsbeta-webnode-01
As for which jobs died -- that's a question for someone with better grid
skills than me :)
-A
On Tuesday, October 7, 2014, Andrew Bogott <[email protected]
<mailto:[email protected]>> wrote:
On 10/7/14 5:54 PM, Andrew Bogott wrote:
One of the labs servers (virt1005) has just died. Marc and I
are investigating, but for the moment roughly 10% of labs
instances are currently in a SHUTOFF state. Please do not
restart these instances until I send an 'all clear' message to
the list.
Virt1005 is back up and seems to be OK. I'm now booting all
instances on that box -- they should be up and running in a few
minutes, but will show signs of an unceremonious reboot so you'll
want to make sure your services are all still running properly.
This crash may be related to overprovisioning on virt1005... we're
in the process of purchasing new hardware to expand capacity and
avoid such issues in the future.
Thank you again for your patience!
-Andrew
_______________________________________________
Labs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/labs-l
_______________________________________________
Labs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/labs-l