Re: rbd volume locked by other nodes

Diego Spinola Castro Thu, 21 Jan 2016 14:15:42 -0800

Hello, i'm not sure if it is related to the lock but i've deleted and
recreated a pvc and it bounds to the same pv.


[azureuser@masterbr0 ~]$ oc get pvc -n casadoanel
NAME         LABELS                               STATUS    VOLUME
CAPACITY   ACCESSMODES   AGE
mysql-data   template=mysql-persistent-template   Bound     ceph-pv2   1Gi
       RWO           10m

[azureuser@masterbr0 ~]$ oc get pv
NAME             LABELS    CAPACITY   ACCESSMODES   STATUS      CLAIM
                           REASON    AGE
ceph-pv2         <none>    1Gi                RWO
Bound          casadoanel/mysql-data                           14d
ceph-pv9         <none>    1Gi                RWO
Released     casadoanel/mysql                                    14d


My question is, once the pvc is deleted and recreated, shouldn't get a
fresh pv ?

2016-01-20 14:03 GMT-03:00 Mark Turansky <[email protected]>:

> Agreed.  Having a separate cleanup process adds time to the cycle and
> prevents the next pod from launching more quickly.
>
> Attach/detach controller issue is
> https://github.com/kubernetes/kubernetes/issues/15524
>
> I just added a comment to that issue describing the benefit of reduced
> latency when detaching happens in a controller that's watching pods.
>
> On Wed, Jan 20, 2016 at 11:46 AM, Clayton Coleman <[email protected]>
> wrote:
>
>> The "bug" / gap here is that we want persistent volumes to be released
>> with low latency because they reduce the availability of an application
>> during a failure (the background release adds X seconds of latency before
>> the next pod can be started). Fortunately the attach controller will solve
>> this problem just by its nature (it can release as soon as deletion is
>> observed).  We should make sure that it reduces this latency.
>>
>> On Jan 20, 2016, at 9:15 AM, Mark Turansky <[email protected]> wrote:
>>
>> Volume cleanup happens separately from pod.
>>
>> Kubelet reconciles the volumes it wants (the sum of all volumes in all
>> pods) with the volumes it has on disk.  Orphaned volumes are removed.
>> Volume cleanup failure won't be reported on the pod.  You'll only find that
>> in Kubelet's logs.
>>
>> There should be plenty of logs on this, as Kubelet will attempt to remove
>> the volume each time it synchronizes itself with what it should have.  By
>> default, this is every minute.
>>
>>
>> On Wed, Jan 20, 2016 at 8:56 AM, Huamin Chen <[email protected]> wrote:
>>
>>> That sounds right. rbd unlock happens during Pod and volume clean up,
>>> which doesn't happen immediately after you delete the Pod.
>>>
>>> Clayton, do you know if there is a deterministic way to tell if the Pod
>>> volume is cleaned up?
>>>
>>>
>>>
>>> On Wed, Jan 20, 2016 at 6:12 AM, Diego Spinola Castro <
>>> [email protected]> wrote:
>>>
>>>> Hello Huamin, i'm aware of the mounting constraints of a block storage.
>>>> What happens is when i remove the pod and wait for a new one sometimes
>>>> it doesn't start for the lock issue.
>>>>
>>>> Here's my pod:
>>>>
>>>> Name: hawkular-cassandra-1-rw5ii
>>>> Namespace: openshift-infra
>>>> Image(s): openshift/origin-metrics-cassandra:latest
>>>> Node:
>>>> nodebr0.xnc3qg4rvmuenbiin5bq5kisfe.nx.internal.cloudapp.net/10.0.2.5
>>>> Start Time: Tue, 19 Jan 2016 21:27:23 +0000
>>>> Labels:
>>>> metrics-infra=hawkular-cassandra,name=hawkular-cassandra-1,type=hawkular-cassandra
>>>> Status: Running
>>>> Reason:
>>>> Message:
>>>> IP: 10.1.2.104
>>>> Replication Controllers: hawkular-cassandra-1 (1/1 replicas created)
>>>> Containers:
>>>>   hawkular-cassandra-1:
>>>>     Container ID:
>>>> docker://4210b08b808a5c2c684ddfc1b734c0a76cd61b23989c83dbff7d7c175e45505f
>>>>     Image: openshift/origin-metrics-cassandra:latest
>>>>     Image ID:
>>>> docker://9f440f6ca921872a9b06d34da808a3a82cb071c16d8089676bc823e309b17724
>>>>     QoS Tier:
>>>>       cpu: BestEffort
>>>>       memory: BestEffort
>>>>     State: Running
>>>>       Started: Tue, 19 Jan 2016 21:28:23 +0000
>>>>     Ready: True
>>>>     Restart Count: 0
>>>>     Environment Variables:
>>>>       CASSANDRA_MASTER: true
>>>>       POD_NAMESPACE: openshift-infra (v1:metadata.namespace)
>>>> Conditions:
>>>>   Type Status
>>>>   Ready True
>>>> Volumes:
>>>>   cassandra-data:
>>>>     Type: PersistentVolumeClaim (a reference to a
>>>> PersistentVolumeClaim in the same namespace)
>>>>     ClaimName: metrics-cassandra-1
>>>>     ReadOnly: false
>>>>   hawkular-cassandra-secrets:
>>>>     Type: Secret (a secret that should populate this volume)
>>>>     SecretName: hawkular-cassandra-secrets
>>>>   cassandra-token-5l0qq:
>>>>     Type: Secret (a secret that should populate this volume)
>>>>     SecretName: cassandra-token-5l0qq
>>>> No events.
>>>>
>>>>
>>>>
>>>> 2016-01-19 22:44 GMT-03:00 Huamin Chen <[email protected]>:
>>>>
>>>>> Diego,
>>>>>
>>>>> rbd volume is expected to be used by just one container to write.
>>>>> Since once used, rbd volume is exclusively owned by that writer. What is
>>>>> your canssandra pod like?
>>>>>
>>>>> Huamin
>>>>>
>>>>> On Tue, Jan 19, 2016 at 7:20 PM, Clayton Coleman <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Hrm - so it sounds like the pod didn't get the volume torn down
>>>>>> correctly to release the volume lock.  Copying some folks who might
>>>>>> know what part of the logs to look for.
>>>>>>
>>>>>> On Tue, Jan 19, 2016 at 7:15 PM, Diego Spinola Castro
>>>>>> <[email protected]> wrote:
>>>>>> > Hi, this is origin 1.1.0.1-0.git.7334.2c6ff4b and a ceph cluster
>>>>>> for block
>>>>>> > storage.
>>>>>> >
>>>>>> > It happens a lot with my cassandra pod but i'd like to check if is
>>>>>> a issue
>>>>>> > or something that i'm doing wrong.
>>>>>> >
>>>>>> > When i delete the pod with a ceph pv it's happens to not start
>>>>>> again,
>>>>>> > looking at the pod events i found;
>>>>>> >
>>>>>> > FailedSync Error syncing pod, skipping: rbd: image cassandra is
>>>>>> locked by
>>>>>> > other nodes
>>>>>> >
>>>>>> >
>>>>>> > I looked up and found the lock at the rbd system, indeed it was
>>>>>> owner by a
>>>>>> > different node, so as soon as i deleted the pod was able to start.
>>>>>> >
>>>>>> > $ rbd lock list cassandra
>>>>>> >
>>>>>> > There is 1 exclusive lock on this image.
>>>>>> > Locker       ID                         Address
>>>>>> > client.10197 kubelet_lock_magic_nodebr0 10.0.2.5:0/1005447
>>>>>> >
>>>>>> > $ rbd lock remove cassandra kubelet_lock_magic_nodebr0 client.10197
>>>>>> >
>>>>>> >
>>>>>> > Does anybody else has this issue?
>>>>>> >
>>>>>> > _______________________________________________
>>>>>> > dev mailing list
>>>>>> > [email protected]
>>>>>> > http://lists.openshift.redhat.com/openshiftmm/listinfo/dev
>>>>>> >
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

_______________________________________________
dev mailing list
[email protected]
http://lists.openshift.redhat.com/openshiftmm/listinfo/dev

Re: rbd volume locked by other nodes

Reply via email to