Hello! This has been resolved now. More detailed incident report will be published soon.
In the meantime: - We rebooted one of the exec hosts (tools-exec-1217) because it was stuck with excess load, and this lost all non-continous jobs running there. Continuous jobs running there would be rescheduled automatically. - Some queues were in error state, and I've cleared the error state (so everything should be ok now) - Some jobs were stuck in error state, I've cleared them (so they have scheduled themselves and are running now) The grid is healthy as of now, so let us know if anything seems amiss. Thanks _______________________________________________ Labs-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/labs-l
