Re: [Canonical-ci-engineering] proposal for next sprint

Thomi Richards Mon, 04 May 2015 13:50:10 -0700

Hi,

Thanks for the reply Francis,

On Tue, May 5, 2015 at 7:37 AM, Francis Ginther <
[email protected]> wrote:

>
> I think there are a number of statistical metrics we should monitoring,
> and this should really be a part of sprint planning. Like all criteria, we
> need to have an idea of what metrics would be useful for the given
> solution. Attempting to come up with all possible metrics up front would
> lead to many that would only add noise. If we would have had some basic
> metrics in place from the beginning (and monitored them) we would have had
> better insight into the impacts of the cloud-config additions and ideally
> had some better performance comparisons with the existing VM solution.
>

Indeed. There's a card for that:

https://trello.com/c/cWbIbcDa/171-we-didn-t-consider-performance-metrics-as-we-developed-the-system

:D

I think we all understood the importance of logging data, but kind of
dropped the ball on stats data. In the future, I think we should think of
"logging & metrics" as being integral parts of developing a new system.
Lesson learned!

>
> Celso has mentioned using ELK plugins for reporting metrics, this could be
> another alternative. I have not looked at this myself.
>

I'd love to get some more information on ELK plugins. I don't have much
experience with elasticsearch, and the little bit I tried to do (backing up
and restoring elasticsearch when we migrated the elk deployment to
production) proved to be tricky.

>
>
<snip>

> I've only had a chance to skim the resources so far. From past experience,
> push metrics worked for everything, but then again, when it's all that was
> available (thinking statsd/graphite) that's all you think about :-).
>
>
Right - there's a good FAQ answer here:
http://prometheus.io/docs/introduction/faq/#why-do-you-pull-rather-than-push
?

but it's important to note that it supports both. I imagine we'd want
'pull' for our core services (rabbit, logstash, kibana, and anything behind
a floating IP like adt-cloud-service), and a 'push' for all our ephemeral
services, and anything where we scale out to multiple nodes.

> So I'm curious - does anyone else see this need? What's the correct way to
>> propose work for the next sprint? I think this would be a nice piece of
>> work for someone to work on for the next few weeks. If no one else wants
>> to, I'll certainly volunteer myself...
>>
>
> I really like the utility we've established with logging to ELK. It's
> become quite painless to add logging content with rich meta-data round it.
> If there is a metrics equivalent, I'm all for it.
>
>
awesome.

I'm really keen to hear from the rest of the team as well. Anyone have any
insights here?

-- 
Thomi Richards
[email protected]

-- 
Mailing list: https://launchpad.net/~canonical-ci-engineering
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~canonical-ci-engineering
More help   : https://help.launchpad.net/ListHelp

Re: [Canonical-ci-engineering] proposal for next sprint

Reply via email to