Re: [Labs-l] [Labs-announce] [Tools] GridEngine maintenance - 27 Jan 2016, 1800-0200 UTC

Merlijn van Deen (valhallasw) Wed, 27 Jan 2016 09:07:58 -0800

Reminder: this will start in an hour.

On 26 January 2016 at 11:00, Yuvi Panda <[email protected]> wrote:


> Impact summary:
>
>     The Gridengine queue requires maintenance that may invalidate
> currently running jobs.  We will perform this maintenance 1/27/2016 at
> 1800-0200 UTC.
>
> Over the course of the last few weeks we have experienced periodic
> crashes of the Grid Engine master.  We have resolved issues
>  surrounding multiple master processes accessing the same queue file.
> Unfortunately, this has not resolved the underlying corruption.
>  We will attempt to dump and rebuild the queue as-is to minimize user
> impact.  If this process is unsuccessful we will have to start a fresh
> queue.  Once the
>  queue has been rebuilt we will be doing a rolling restart of
> exec/webgird nodes to refresh job associations with the master
> process.
>
> This is part of our ongoing work to stabilize the Gridengine setup.
>
> Thanks for your patience,
>
> Labs Team
>
> _______________________________________________
> Labs-announce mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/labs-announce
> _______________________________________________
> Labs-l mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/labs-l
>

_______________________________________________
Labs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/labs-l

Re: [Labs-l] [Labs-announce] [Tools] GridEngine maintenance - 27 Jan 2016, 1800-0200 UTC

Reply via email to