答复: 答复: RBD primary storage VM encounters Exclusive Lock after triggering HA

li jerry Tue, 28 May 2019 20:41:33 -0700

Thx Wido！



after the ceph admin node executed the following command, my problem was solved.



[root@cn01-nodea ~]#  ceph auth caps client.cloudstack mon 'allow profile rbd' 
osd 'allow profile rbd pool=rbd'





________________________________
发件人: Wido den Hollander <w...@widodh.nl>
发送时间: Tuesday, May 28, 2019 7:50:43 PM
收件人: li jerry; dev@cloudstack.apache.org; us...@cloudstack.apache.org
主题: Re: 答复: RBD primary storage VM encounters Exclusive Lock after triggering HA



On 5/28/19 1:48 PM, li jerry wrote:
> Hi Wido
>
>
>
> I filled in the CLOUDSTACK is the following KEY
>
>
>
> [root@cn01-nodeb ~]# ceph auth get client.cloudstack
>
> exported keyring for client.cloudstack
>
> [client.cloudstack]
>
>       key = AQDTh7pcIJjNIhAAwk8jtxilJWXQR7osJRFMLw==
>
>       caps mon = "allow r"
>
>       caps osd = "allow rwx pool=rbd"
>
>

That's the problem :-) Your user needs to be updated.

The caps should be:

[client.cloudstack]
     key = AQDTh7pcIJjNIhAAwk8jtxilJWXQR7osJRFMLw==
     caps mon = "profile rbd"
     caps osd = "profile rbd pool=rbd"

See: http://docs.ceph.com/docs/master/rbd/rbd-cloudstack/

This will allow the client to blacklist the other and take over the
exclusive-lock.

Wido

>
> *发件人: *Wido den Hollander <mailto:w...@widodh.nl>
> *发送时间: *2019年5月28日19:42
> *收件人: *dev@cloudstack.apache.org <mailto:dev@cloudstack.apache.org>;
> li jerry <mailto:div...@hotmail.com>; us...@cloudstack.apache.org
> <mailto:us...@cloudstack.apache.org>
> *主题: *Re: RBD primary storage VM encounters Exclusive Lock after
> triggering HA
>
>
>
>
>
> On 5/28/19 6:16 AM, li jerry wrote:
>> Hello guys
>>
>> we’ve deployed an environment with CloudStack 4.11.2 and KVM(CentOS7.6), and 
>> Ceph 13.2.5 is deployed as the primary storage.
>> We found some issues with the HA solution, and we are here to ask for you 
>> suggestions.
>>
>> We’ve both enabled VM HA and Host HA feature in CloudStack, and the compute 
>> offering is tagged as ha.
>> When we try to perform a power failure test (unplug 1 node of 4), the 
>> running VMs on the removed node is automatically rescheduled to the other 
>> living nodes after 5 minutes, but all of them can not boot into the OS. We 
>> found the booting procedure is stuck by the IO read/write failure.
>>
>>
>>
>> The following information is prompted after VM starts:
>>
>> Generating "/run/initramfs/rdsosreport.txt"
>>
>> Entering emergency mode. Exit the shell to continue.
>> Type "journalctl" to view system logs.
>> You might want to save "/run/initramfs/rdsosreport.txt" to a USB stick or 
>> /boot
>> after mounting them and attach it to a bug report
>>
>> :/#
>>
>>
>>
>> We found this is caused by the lock on the image:
>> [root@cn01-nodea ~]# rbd lock list a93010b0-2be2-49bd-b25e-ec89b3a98b4b
>> There is 1 exclusive lock on this image.
>> Locker         ID                  Address
>> client.1164351 auto 94464726847232 10.226.16.128:0/3002249644
>>
>> If we remove the lock from the image, and restart the VM under CloudStack, 
>> this VM will boot successfully.
>>
>> We know that if we disable the Exclusive Lock feature (by setting 
>> rbd_default_features = 3) for Ceph would solve this problem. But we don’t 
>> think it’s the best solution for the HA, so could you please give us some 
>> ideas about how you are doing and what is the best practice for this feature?
>>
>
> exclusive-lock is something to prevent a split-brain and having two
> clients write to it at the same time.
>
> The lock should be released to the other client if this is requested,
> but I have the feeling that you might have a cephx problem there.
>
> Can you post the output of:
>
> $ ceph auth get client.X
>
> Where you replace X by the user you are using for CloudStack? Also
> remove they 'key', I don't need that.
>
> I want to look at the caps of the user.
>
> Wido
>
>> Thanks.
>>
>>
>
>
>

答复: 答复: RBD primary storage VM encounters Exclusive Lock after triggering HA

Reply via email to