Re: [lopsa-discuss] Monitoring systems for cloud nodes

Ash Palmer Mon, 25 Mar 2013 06:21:54 -0700

Hello,

In a way, I would like to bump this post as I am also curious to howothers have deployed monitoring in dynamic cloud environments. Thank youMorgan for outlining a common problem for many.


- Ash

On 24.03.2013 03:08, Morgan Blackthorne wrote:

This is a spin-off question related to the other monitoring system
thread we have going, taking it from the general direction towards a
specific use-case scenario.

I've used several systems throughout the years, the last two notably
being Nagios and Zabbix. Nagios seems better suited for monitoring,
while Zabbix is clearly superior in terms of graphing. Configuring
NagiosGraph is... more difficult than it should be, IMO. The Zabbix

agent seemed to be less reliable than NRPE, however, and last Iworked

with Zabbix it seemed to default to not alerting unless explicitly
configured to do so. (It's been a while since we moved away from it,
so my memory is a bit foggy. Near as I can recall, a configured alarm
via a Zabbix agent check would not fire if the agent itself was not
reachable, and the system did not natively support the concept of a
"host down" alert in that situation, either. You had to manually
configure a check of the network interfaces and the agent itself,
which seemed very counter-intuitive, and let to many situations where
we hadn't properly thought through all failure scenarios to configure

the alarms explicitly enough. All that said, I know some of theissues

we had and raised with Zabbix were marked as pending the 2.x branch,
which is out now-- I'm not sure if they've been resolved or if the
framework to resolve them is now in place.)

However, I'm specifically curious to see what people are using for
environments where the hosts can be spun up and down outside the
control of the normal provisioning channels. I know that there's been
significant work done lately by the Opscode folks to configure Nagios
dynamically via Chef, which is something I've got on my to-research
list when I get beyond the ops programming tasks on my plate right
now. I believe the downside of that would be whatever the interval is
between a node being terminated and the configuration being

regenerated. I know that Zabbix also supports the idea of dynamicnoderegistration, which seems very applicable in this case, but again,I'm

not sure if it's got some kind of pruning capability in place.

I'm also curious to know along these lines if anyone has worked with
a system (either native or with a connector) that will take advantage
of Amazon's CloudWatch metrics. I could certainly monitor things like
CPU and network utilization myself, but if AWS is already doing so,
polling their data seems like it would be easier. (Potentially

cleaner? I'm undecided on that, since it seems like it couldintroduce

another dependency-- yet I've never seen CloudWatch unavailable when
the core EC2 services were working. However, I may not have explored
it in enough detail to see that kind of failure, so... I remain
undecided.) One of the upsides of integrating with CloudWatch is that

I can monitor the same metrics that autoscaling is operating on, andI

believe actually retrieve those thresholds as well, rather than
needing to configure them by hand (or by role in Chef, but that would
still need to be manually updated if I changed the autoscaling
parameters).

Thanks for any thoughts. :)

--
~*~ StormeRider ~*~

"Every world needs its heroes [...] They inspire us to be better than
we are. And they protect from the darkness that's just around the
corner."

(from Smallville Season 6x1: "Zod")

On why I hate the phrase "that's so lame"... http://bit.ly/Ps3uSS [1]

Links:
------
[1] http://bit.ly/Ps3uSS

_______________________________________________
Discuss mailing list
[email protected]
https://lists.lopsa.org/cgi-bin/mailman/listinfo/discuss

This list provided by the League of Professional SystemAdministrators

 http://lopsa.org/


_______________________________________________
Discuss mailing list
[email protected]
https://lists.lopsa.org/cgi-bin/mailman/listinfo/discuss
This list provided by the League of Professional System Administrators
http://lopsa.org/

Re: [lopsa-discuss] Monitoring systems for cloud nodes

Reply via email to