Turns out it was excessive disk read across all nodes. There was too many
container start errors.

Thanks Derek for the tip and Solly for your time.
I guess logs wont be necessary anymore.

Regards,

--
Mateus Caruccio / Master of Puppets
GetupCloud.com
We make the infrastructure invisible

2017-03-23 15:15 GMT+00:00 Solly Ross <[email protected]>:

> Also, would it be possible to see the full set of Heapster logs?
> Sometimes there can be useful signals in there as to what's going
> on that aren't immediately obvious.
>
> Best Regards,
> Solly Ross
>
> ----- Original Message -----
> > From: "Derek Carr" <[email protected]>
> > To: "Mateus Caruccio" <[email protected]>
> > Cc: "Solly Ross" <[email protected]>, [email protected]
> > Sent: Wednesday, March 22, 2017 11:27:11 PM
> > Subject: Re: Heapster failing for some pods
> >
> > Are you seeing high iops on impacted nodes?
> >
> > If so, it could be related to the following:
> > https://github.com/openshift/origin/pull/12822
> >
> > If so, you can try to remove thin_ls from your host so it will not be
> used
> > to do per container devicemapper usage stats in cAdvisor which has been
> > shown to cause issues similar to this.
> >
> > Thanks,
> >
> > On Wed, Mar 22, 2017 at 9:42 PM Mateus Caruccio <
> > [email protected]> wrote:
> >
> > > At
> > > https://paste.fedoraproject.org/paste/FYFahXSMMQOVUWHkXcrer15M1UNdIG
> YhyRLivL9gydE=
> > > you can find a log grep from heapster with --sink=log set.
> > >
> > > Looking for pod "portal-107-rg2ia" one can see it's not being sinked
> every
> > > scraping period (only 3/9 during this snippet).
> > >
> > >
> > >
> > > --
> > > Mateus Caruccio / Master of Puppets
> > > GetupCloud.com
> > > We make the infrastructure invisible
> > >
> > > 2017-03-22 19:43 GMT-03:00 Derek Carr <[email protected]>:
> > >
> > > +Solly
> > >
> > > Anything you can assist with here?
> > >
> > > Thanks,
> > >
> > > On Wed, Mar 22, 2017 at 6:27 PM Mateus Caruccio <
> > > [email protected]> wrote:
> > >
> > > Hi.
> > >
> > > Heapster is experiencing failures for some pods of the cluster, which
> in
> > > turn causes HPA to malfunction.
> > >
> > > From project events I can see:
> > >
> > > 2017-03-22T22:13:29Z   2017-03-22T21:32:59Z   32        portal
> > >  HorizontalPodAutoscaler               Warning   FailedGetMetrics
> > > {horizontal-pod-autoscaler }   failed to get CPU consumption and
> request:
> > > metrics obtained for 2/4 of pods
> > > 2017-03-22T22:13:29Z   2017-03-22T21:32:59Z   32        portal
> > >  HorizontalPodAutoscaler             Warning   FailedComputeReplicas
> > > {horizontal-pod-autoscaler }   failed to get CPU utilization: failed
> to get
> > > CPU consumption and request: metrics obtained for 2/4 of pods
> > >
> > >
> > > Heapster logs says some pods have no metrics, while other pods from the
> > > same project does:
> > >
> > > I0322 22:10:29.104727       1 handlers.go:242] No metrics for container
> > > wordpress in pod kondzilla/portal-107-rg2ia
> > > I0322 22:10:29.104746       1 handlers.go:178] No metrics for pod
> > > kondzilla/portal-107-rg2ia
> > > ...
> > > I0322 22:12:21.780763       1 pod_based_enricher.go:141] Container
> > > namespace:kondzilla/pod:portal-107-rg2ia/container:wordpress not
> found,
> > > creating a stub
> > >
> > >
> > > Hitting kubelete's /stats/container/ does returns valid stats, as
> > > expected.
> > >
> > >
> > > I'm running:
> > >
> > > openshift v1.3.1
> > > kubernetes v1.3.0+52492b4
> > > etcd 2.3.0+git
> > >
> > > openshift/origin-metrics-cassandra:v1.3.1
> > > openshift/origin-metrics-hawkular-metrics:v1.3.1
> > > openshift/origin-metrics-heapster:v1.3.2 (v1.3.1 has the same effect)
> > >
> > >
> > > Thanks,
> > >
> > > --
> > > Mateus Caruccio / Master of Puppets
> > > GetupCloud.com
> > > We make the infrastructure invisible
> > > _______________________________________________
> > > dev mailing list
> > > [email protected]
> > > http://lists.openshift.redhat.com/openshiftmm/listinfo/dev
> > >
> > >
> > >
> >
>
_______________________________________________
dev mailing list
[email protected]
http://lists.openshift.redhat.com/openshiftmm/listinfo/dev

Reply via email to