Reminder: this will start in an hour. On 26 January 2016 at 11:00, Yuvi Panda <[email protected]> wrote:
> Impact summary: > > The Gridengine queue requires maintenance that may invalidate > currently running jobs. We will perform this maintenance 1/27/2016 at > 1800-0200 UTC. > > Over the course of the last few weeks we have experienced periodic > crashes of the Grid Engine master. We have resolved issues > surrounding multiple master processes accessing the same queue file. > Unfortunately, this has not resolved the underlying corruption. > We will attempt to dump and rebuild the queue as-is to minimize user > impact. If this process is unsuccessful we will have to start a fresh > queue. Once the > queue has been rebuilt we will be doing a rolling restart of > exec/webgird nodes to refresh job associations with the master > process. > > This is part of our ongoing work to stabilize the Gridengine setup. > > Thanks for your patience, > > Labs Team > > _______________________________________________ > Labs-announce mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/labs-announce > _______________________________________________ > Labs-l mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/labs-l >
_______________________________________________ Labs-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/labs-l
