Re:Re: RBD primary storage VM encounters Exclusive Lock after triggering HA

Haijiao Tue, 28 May 2019 02:59:35 -0700

Hi, Andrija


I think Ceph has a versioning convention that x.2.z means stable release for 
production. http://docs.ceph.com/docs/master/releases/schedule/


x.0.z - development releases (for early testers and the brave at heart)
x.1.z - release candidates (for test clusters, brave users)
x.2.z - stable/bugfix releases (for users)


And if we look into the release note of Minic, it states something like  'This 
is the fifth bugfix release of the Mimic v13.2.x long term stable release 
series. We recommend all Mimic users upgrade.'  
http://docs.ceph.com/docs/master/releases/mimic/#v13-2-4-mimic


:-)


At 2019-05-28 14:40:45, "Andrija Panic" <andrija.pa...@gmail.com> wrote:
>Hi Li,
>
>You would like to take a look at the next PR from Wido -
>https://github.com/apache/cloudstack/pull/2985 - this is 4.12 only.
>
>In other words, you are using Mimic, non-LTS release of Ceph - and I have a
>hard time believing that anyone is using this in production with CloudStack
>(since it's decently recent Ceph release).
>
>Test a ACS 4.12 and see if your problem goes away.
>
>@Wido den Hollander <w...@42on.com> , any thought?
>
>Regards,
>Andrija
>
>On Tue, 28 May 2019 at 06:24, li jerry <div...@hotmail.com> wrote:
>
>> Hello guys
>>
>> we’ve deployed an environment with CloudStack 4.11.2 and KVM(CentOS7.6),
>> and Ceph 13.2.5 is deployed as the primary storage.
>> We found some issues with the HA solution, and we are here to ask for you
>> suggestions.
>>
>> We’ve both enabled VM HA and Host HA feature in CloudStack, and the
>> compute offering is tagged as ha.
>> When we try to perform a power failure test (unplug 1 node of 4), the
>> running VMs on the removed node is automatically rescheduled to the other
>> living nodes after 5 minutes, but all of them can not boot into the OS. We
>> found the booting procedure is stuck by the IO read/write failure.
>>
>>
>>
>> The following information is prompted after VM starts:
>>
>> Generating "/run/initramfs/rdsosreport.txt"
>>
>> Entering emergency mode. Exit the shell to continue.
>> Type "journalctl" to view system logs.
>> You might want to save "/run/initramfs/rdsosreport.txt" to a USB stick or
>> /boot
>> after mounting them and attach it to a bug report
>>
>> :/#
>>
>>
>>
>> We found this is caused by the lock on the image:
>> [root@cn01-nodea ~]# rbd lock list a93010b0-2be2-49bd-b25e-ec89b3a98b4b
>> There is 1 exclusive lock on this image.
>> Locker         ID                  Address
>> client.1164351 auto 94464726847232 10.226.16.128:0/3002249644
>>
>> If we remove the lock from the image, and restart the VM under CloudStack,
>> this VM will boot successfully.
>>
>> We know that if we disable the Exclusive Lock feature (by setting
>> rbd_default_features = 3) for Ceph would solve this problem. But we don’t
>> think it’s the best solution for the HA, so could you please give us some
>> ideas about how you are doing and what is the best practice for this
>> feature?
>>
>> Thanks.
>>
>>
>
>-- 
>
>Andrija Panić

Re:Re: RBD primary storage VM encounters Exclusive Lock after triggering HA

Reply via email to