Re: rbd volume locked by other nodes

Mark Turansky Wed, 20 Jan 2016 06:16:59 -0800

Volume cleanup happens separately from pod.

Kubelet reconciles the volumes it wants (the sum of all volumes in all
pods) with the volumes it has on disk.  Orphaned volumes are removed.
Volume cleanup failure won't be reported on the pod.  You'll only find that
in Kubelet's logs.


There should be plenty of logs on this, as Kubelet will attempt to remove
the volume each time it synchronizes itself with what it should have.  By
default, this is every minute.


On Wed, Jan 20, 2016 at 8:56 AM, Huamin Chen <[email protected]> wrote:

> That sounds right. rbd unlock happens during Pod and volume clean up,
> which doesn't happen immediately after you delete the Pod.
>
> Clayton, do you know if there is a deterministic way to tell if the Pod
> volume is cleaned up?
>
>
>
> On Wed, Jan 20, 2016 at 6:12 AM, Diego Spinola Castro <
> [email protected]> wrote:
>
>> Hello Huamin, i'm aware of the mounting constraints of a block storage.
>> What happens is when i remove the pod and wait for a new one sometimes it
>> doesn't start for the lock issue.
>>
>> Here's my pod:
>>
>> Name: hawkular-cassandra-1-rw5ii
>> Namespace: openshift-infra
>> Image(s): openshift/origin-metrics-cassandra:latest
>> Node:
>> nodebr0.xnc3qg4rvmuenbiin5bq5kisfe.nx.internal.cloudapp.net/10.0.2.5
>> Start Time: Tue, 19 Jan 2016 21:27:23 +0000
>> Labels:
>> metrics-infra=hawkular-cassandra,name=hawkular-cassandra-1,type=hawkular-cassandra
>> Status: Running
>> Reason:
>> Message:
>> IP: 10.1.2.104
>> Replication Controllers: hawkular-cassandra-1 (1/1 replicas created)
>> Containers:
>>   hawkular-cassandra-1:
>>     Container ID:
>> docker://4210b08b808a5c2c684ddfc1b734c0a76cd61b23989c83dbff7d7c175e45505f
>>     Image: openshift/origin-metrics-cassandra:latest
>>     Image ID:
>> docker://9f440f6ca921872a9b06d34da808a3a82cb071c16d8089676bc823e309b17724
>>     QoS Tier:
>>       cpu: BestEffort
>>       memory: BestEffort
>>     State: Running
>>       Started: Tue, 19 Jan 2016 21:28:23 +0000
>>     Ready: True
>>     Restart Count: 0
>>     Environment Variables:
>>       CASSANDRA_MASTER: true
>>       POD_NAMESPACE: openshift-infra (v1:metadata.namespace)
>> Conditions:
>>   Type Status
>>   Ready True
>> Volumes:
>>   cassandra-data:
>>     Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim
>> in the same namespace)
>>     ClaimName: metrics-cassandra-1
>>     ReadOnly: false
>>   hawkular-cassandra-secrets:
>>     Type: Secret (a secret that should populate this volume)
>>     SecretName: hawkular-cassandra-secrets
>>   cassandra-token-5l0qq:
>>     Type: Secret (a secret that should populate this volume)
>>     SecretName: cassandra-token-5l0qq
>> No events.
>>
>>
>>
>> 2016-01-19 22:44 GMT-03:00 Huamin Chen <[email protected]>:
>>
>>> Diego,
>>>
>>> rbd volume is expected to be used by just one container to write. Since
>>> once used, rbd volume is exclusively owned by that writer. What is your
>>> canssandra pod like?
>>>
>>> Huamin
>>>
>>> On Tue, Jan 19, 2016 at 7:20 PM, Clayton Coleman <[email protected]>
>>> wrote:
>>>
>>>> Hrm - so it sounds like the pod didn't get the volume torn down
>>>> correctly to release the volume lock.  Copying some folks who might
>>>> know what part of the logs to look for.
>>>>
>>>> On Tue, Jan 19, 2016 at 7:15 PM, Diego Spinola Castro
>>>> <[email protected]> wrote:
>>>> > Hi, this is origin 1.1.0.1-0.git.7334.2c6ff4b and a ceph cluster for
>>>> block
>>>> > storage.
>>>> >
>>>> > It happens a lot with my cassandra pod but i'd like to check if is a
>>>> issue
>>>> > or something that i'm doing wrong.
>>>> >
>>>> > When i delete the pod with a ceph pv it's happens to not start again,
>>>> > looking at the pod events i found;
>>>> >
>>>> > FailedSync Error syncing pod, skipping: rbd: image cassandra is
>>>> locked by
>>>> > other nodes
>>>> >
>>>> >
>>>> > I looked up and found the lock at the rbd system, indeed it was owner
>>>> by a
>>>> > different node, so as soon as i deleted the pod was able to start.
>>>> >
>>>> > $ rbd lock list cassandra
>>>> >
>>>> > There is 1 exclusive lock on this image.
>>>> > Locker       ID                         Address
>>>> > client.10197 kubelet_lock_magic_nodebr0 10.0.2.5:0/1005447
>>>> >
>>>> > $ rbd lock remove cassandra kubelet_lock_magic_nodebr0 client.10197
>>>> >
>>>> >
>>>> > Does anybody else has this issue?
>>>> >
>>>> > _______________________________________________
>>>> > dev mailing list
>>>> > [email protected]
>>>> > http://lists.openshift.redhat.com/openshiftmm/listinfo/dev
>>>> >
>>>>
>>>
>>>
>>
>

_______________________________________________
dev mailing list
[email protected]
http://lists.openshift.redhat.com/openshiftmm/listinfo/dev

Re: rbd volume locked by other nodes

Reply via email to