Agreed. Having a separate cleanup process adds time to the cycle and prevents the next pod from launching more quickly.
Attach/detach controller issue is https://github.com/kubernetes/kubernetes/issues/15524 I just added a comment to that issue describing the benefit of reduced latency when detaching happens in a controller that's watching pods. On Wed, Jan 20, 2016 at 11:46 AM, Clayton Coleman <[email protected]> wrote: > The "bug" / gap here is that we want persistent volumes to be released > with low latency because they reduce the availability of an application > during a failure (the background release adds X seconds of latency before > the next pod can be started). Fortunately the attach controller will solve > this problem just by its nature (it can release as soon as deletion is > observed). We should make sure that it reduces this latency. > > On Jan 20, 2016, at 9:15 AM, Mark Turansky <[email protected]> wrote: > > Volume cleanup happens separately from pod. > > Kubelet reconciles the volumes it wants (the sum of all volumes in all > pods) with the volumes it has on disk. Orphaned volumes are removed. > Volume cleanup failure won't be reported on the pod. You'll only find that > in Kubelet's logs. > > There should be plenty of logs on this, as Kubelet will attempt to remove > the volume each time it synchronizes itself with what it should have. By > default, this is every minute. > > > On Wed, Jan 20, 2016 at 8:56 AM, Huamin Chen <[email protected]> wrote: > >> That sounds right. rbd unlock happens during Pod and volume clean up, >> which doesn't happen immediately after you delete the Pod. >> >> Clayton, do you know if there is a deterministic way to tell if the Pod >> volume is cleaned up? >> >> >> >> On Wed, Jan 20, 2016 at 6:12 AM, Diego Spinola Castro < >> [email protected]> wrote: >> >>> Hello Huamin, i'm aware of the mounting constraints of a block storage. >>> What happens is when i remove the pod and wait for a new one sometimes >>> it doesn't start for the lock issue. >>> >>> Here's my pod: >>> >>> Name: hawkular-cassandra-1-rw5ii >>> Namespace: openshift-infra >>> Image(s): openshift/origin-metrics-cassandra:latest >>> Node: >>> nodebr0.xnc3qg4rvmuenbiin5bq5kisfe.nx.internal.cloudapp.net/10.0.2.5 >>> Start Time: Tue, 19 Jan 2016 21:27:23 +0000 >>> Labels: >>> metrics-infra=hawkular-cassandra,name=hawkular-cassandra-1,type=hawkular-cassandra >>> Status: Running >>> Reason: >>> Message: >>> IP: 10.1.2.104 >>> Replication Controllers: hawkular-cassandra-1 (1/1 replicas created) >>> Containers: >>> hawkular-cassandra-1: >>> Container ID: >>> docker://4210b08b808a5c2c684ddfc1b734c0a76cd61b23989c83dbff7d7c175e45505f >>> Image: openshift/origin-metrics-cassandra:latest >>> Image ID: >>> docker://9f440f6ca921872a9b06d34da808a3a82cb071c16d8089676bc823e309b17724 >>> QoS Tier: >>> cpu: BestEffort >>> memory: BestEffort >>> State: Running >>> Started: Tue, 19 Jan 2016 21:28:23 +0000 >>> Ready: True >>> Restart Count: 0 >>> Environment Variables: >>> CASSANDRA_MASTER: true >>> POD_NAMESPACE: openshift-infra (v1:metadata.namespace) >>> Conditions: >>> Type Status >>> Ready True >>> Volumes: >>> cassandra-data: >>> Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim >>> in the same namespace) >>> ClaimName: metrics-cassandra-1 >>> ReadOnly: false >>> hawkular-cassandra-secrets: >>> Type: Secret (a secret that should populate this volume) >>> SecretName: hawkular-cassandra-secrets >>> cassandra-token-5l0qq: >>> Type: Secret (a secret that should populate this volume) >>> SecretName: cassandra-token-5l0qq >>> No events. >>> >>> >>> >>> 2016-01-19 22:44 GMT-03:00 Huamin Chen <[email protected]>: >>> >>>> Diego, >>>> >>>> rbd volume is expected to be used by just one container to write. Since >>>> once used, rbd volume is exclusively owned by that writer. What is your >>>> canssandra pod like? >>>> >>>> Huamin >>>> >>>> On Tue, Jan 19, 2016 at 7:20 PM, Clayton Coleman <[email protected]> >>>> wrote: >>>> >>>>> Hrm - so it sounds like the pod didn't get the volume torn down >>>>> correctly to release the volume lock. Copying some folks who might >>>>> know what part of the logs to look for. >>>>> >>>>> On Tue, Jan 19, 2016 at 7:15 PM, Diego Spinola Castro >>>>> <[email protected]> wrote: >>>>> > Hi, this is origin 1.1.0.1-0.git.7334.2c6ff4b and a ceph cluster for >>>>> block >>>>> > storage. >>>>> > >>>>> > It happens a lot with my cassandra pod but i'd like to check if is a >>>>> issue >>>>> > or something that i'm doing wrong. >>>>> > >>>>> > When i delete the pod with a ceph pv it's happens to not start again, >>>>> > looking at the pod events i found; >>>>> > >>>>> > FailedSync Error syncing pod, skipping: rbd: image cassandra is >>>>> locked by >>>>> > other nodes >>>>> > >>>>> > >>>>> > I looked up and found the lock at the rbd system, indeed it was >>>>> owner by a >>>>> > different node, so as soon as i deleted the pod was able to start. >>>>> > >>>>> > $ rbd lock list cassandra >>>>> > >>>>> > There is 1 exclusive lock on this image. >>>>> > Locker ID Address >>>>> > client.10197 kubelet_lock_magic_nodebr0 10.0.2.5:0/1005447 >>>>> > >>>>> > $ rbd lock remove cassandra kubelet_lock_magic_nodebr0 client.10197 >>>>> > >>>>> > >>>>> > Does anybody else has this issue? >>>>> > >>>>> > _______________________________________________ >>>>> > dev mailing list >>>>> > [email protected] >>>>> > http://lists.openshift.redhat.com/openshiftmm/listinfo/dev >>>>> > >>>>> >>>> >>>> >>> >> >
_______________________________________________ dev mailing list [email protected] http://lists.openshift.redhat.com/openshiftmm/listinfo/dev
