Re: ALARM - ACS reboots host servers!!!

Andrei Mikhailovsky Mon, 03 Mar 2014 04:35:28 -0800

Pretty poor, I agree.


IMHO the ACS agent should not be allowed to reboot the host server. This is not 
the type of things you would like to automate as you will eventually end up 
with broken volumes and data loss. 


And you are right of course, like what happened in my case. I currently have 
two vms which used that NFS server for volumes and the rest 50+ vms use ceph. 
As a result of the nfs server reboot all host servers have rebooted causing 50+ 
vms to reset without being properly shutdown. 


I am using ACS 4.2.1 with KVM, so this issue seems to be present on KVM + 
XenServer. 


Andrei 
----- Original Message -----

From: "France" <[email protected]> 
To: [email protected] 
Cc: [email protected] 
Sent: Monday, 3 March, 2014 8:49:28 AM 
Subject: Re: ALARM - ACS reboots host servers!!! 

I believe this is a bug too, because VMs not running on the storage, get 
destroyed too: 

Issue has been around for a long time, like with all others I reported. 
They do not get fixed: 
https://issues.apache.org/jira/browse/CLOUDSTACK-3367 

We even lost assignee today. 

Regards, 
F. 

On 3/3/14 6:55 AM, Koushik Das wrote: 
> The primary storage needs to be put in maintenance before doing any 
> upgrade/reboot as mentioned in the previous mails. 
> 
> -Koushik 
> 
> On 03-Mar-2014, at 6:07 AM, Marcus <[email protected]> wrote: 
> 
>> Also, please note that in the bug you referenced it doesn't have a 
>> problem with the reboot being triggered, but with the fact that reboot 
>> never completes due to hanging NFS mount (which is why the reboot 
>> occurs, inaccessible primary storage). 
>> 
>> On Sun, Mar 2, 2014 at 5:26 PM, Marcus <[email protected]> wrote: 
>>> Or do you mean you have multiple primary storages and this one was not 
>>> in use and put into maintenance? 
>>> 
>>> On Sun, Mar 2, 2014 at 5:25 PM, Marcus <[email protected]> wrote: 
>>>> I'm not sure I understand. How do you expect to reboot your primary 
>>>> storage while vms are running? It sounds like the host is being 
>>>> fenced since it cannot contact the resources it depends on. 
>>>> 
>>>> On Sun, Mar 2, 2014 at 3:24 PM, Nux! <[email protected]> wrote: 
>>>>> On 02.03.2014 21:17, Andrei Mikhailovsky wrote: 
>>>>>> Hello guys, 
>>>>>> 
>>>>>> 
>>>>>> I've recently came across the bug CLOUDSTACK-5429 which has rebooted 
>>>>>> all of my host servers without properly shutting down the guest vms. 
>>>>>> I've simply upgraded and rebooted one of the nfs primary storage 
>>>>>> servers and a few minutes later, to my horror, i've found out that all 
>>>>>> of my host servers have been rebooted. Is it just me thinking so, or 
>>>>>> is this bug should be fixed ASAP and should be a blocker for any new 
>>>>>> ACS release. I mean not only does it cause downtime, but also possible 
>>>>>> data loss and server corruption. 
>>>>> 
>>>>> Hi Andrei, 
>>>>> 
>>>>> Do you have HA enabled and did you put that primary storage in 
>>>>> maintenance 
>>>>> mode before rebooting it? 
>>>>> It's my understanding that ACS relies on the shared storage to perform HA 
>>>>> so 
>>>>> if the storage goes it's expected to go berserk. I've noticed similar 
>>>>> behaviour in Xenserver pools without ACS. 
>>>>> I'd imagine a "cure" for this would be to use network distributed 
>>>>> "filesystems" like GlusterFS or CEPH. 
>>>>> 
>>>>> Lucian 
>>>>> 
>>>>> -- 
>>>>> Sent from the Delta quadrant using Borg technology! 
>>>>> 
>>>>> Nux! 
>>>>> www.nux.ro

Re: ALARM - ACS reboots host servers!!!

Reply via email to