Re: [Labs-l] [Labs-announce] [Tools] GridEngine maintenance - 27 Jan 2016, 1800-0200 UTC

Merlijn van Deen (valhallasw) Wed, 27 Jan 2016 10:11:22 -0800

And we are now starting :-)

On 27 January 2016 at 18:07, Merlijn van Deen (valhallasw) <
[email protected]> wrote:


> Reminder: this will start in an hour.
>
> On 26 January 2016 at 11:00, Yuvi Panda <[email protected]> wrote:
>
>> Impact summary:
>>
>>     The Gridengine queue requires maintenance that may invalidate
>> currently running jobs.  We will perform this maintenance 1/27/2016 at
>> 1800-0200 UTC.
>>
>> Over the course of the last few weeks we have experienced periodic
>> crashes of the Grid Engine master.  We have resolved issues
>>  surrounding multiple master processes accessing the same queue file.
>> Unfortunately, this has not resolved the underlying corruption.
>>  We will attempt to dump and rebuild the queue as-is to minimize user
>> impact.  If this process is unsuccessful we will have to start a fresh
>> queue.  Once the
>>  queue has been rebuilt we will be doing a rolling restart of
>> exec/webgird nodes to refresh job associations with the master
>> process.
>>
>> This is part of our ongoing work to stabilize the Gridengine setup.
>>
>> Thanks for your patience,
>>
>> Labs Team
>>
>> _______________________________________________
>> Labs-announce mailing list
>> [email protected]
>> https://lists.wikimedia.org/mailman/listinfo/labs-announce
>> _______________________________________________
>> Labs-l mailing list
>> [email protected]
>> https://lists.wikimedia.org/mailman/listinfo/labs-l
>>
>
>

_______________________________________________
Labs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/labs-l

Re: [Labs-l] [Labs-announce] [Tools] GridEngine maintenance - 27 Jan 2016, 1800-0200 UTC

Reply via email to