Re: Using net-snmp to monitor a dynamic cloud-based back-end

Fulko Hew Wed, 06 Nov 2013 08:45:55 -0800

On Wed, Nov 6, 2013 at 11:05 AM, Maarten Wijnants <
maarten.wijna...@uhasselt.be> wrote:

>  Dear Fulko Hew,
> On 5/11/2013 14:42, Fulko Hew wrote:
>
>  <snip>
>
>  I have instituted exactly this philosophy on the application developers
> within my company.
>
>  After a few decades of creating and supporting applications and devices,
>  I came to the conclusion that standalone applications should not be
> looked
> at any differently than 'embedded' applications in dedicated devices (eg.
> routers).
>  After all, even a router is 'just another application'.
>
>
> This indeed largely coincides with our vision. We would like to collect
> granular (performance) information about individual servers/applications
> running on particular devices instead of limiting ourselves to gathering
> global information about the status of devices in their totality. This is
> due to the multi-service nature of our cloud-based back-end, where a single
> back-end device is able to host multiple highly heterogeneous services.
>

The MIB I created, doesn't define the granularity, its up to
the application designer to populate the tables with degree of
visibility required.  Each application may/will be different,
and each concept/table, may not be applicable.  To-date, I have
10 different applications, written by 6 different people, all
exposing different levels of detail

>   So I then abstracted out the 'common items' that all applications have
> and bundled them into a MIB (and implemented a Perl and Java library
>  that allow our apps (written in those languages only at the moment)
> to provide and receive info (snmp get and set) and send traps for
>  noteworthy items.  And these libraries interact with Net-SNMP via AgentX
>
>  Thanks for sharing this information!
>
>   The abstracted stuff fell into 5 categories:
>  - Dependencies - All apps 'need' stuff, like files/connections/other
> apps, etc
>   as inputs and outputs.  Ie. Things 'this app' depends on to work, and if
>    a dependency isn't being fulfilled, the app will not work as needed.
>    This table also holds simple usage and response stats on each
> 'dependency'
>    along with traps for up/down, usage threshold alarms and response time
> alarms.
>  - Capacity/resource - Where 'dependencies' are a 'comms' kind of thing
>    the 'capacity' table, allows the app developers to communicate
> info/status
>    on 'consumable' things an app needs or consumes (buffer pools, memory,
> sockets, etc)
>    And it defines traps for capacity alarms and when the consumption goes
> back
>    to normal.
>  - Response time - Even though the dependency table has response time info
>    it is very simple.  This table allows you to report _any_ kind of thing
>    that you want to comminicate the response time of... with thresholds
> and
>    good/bad traps.
>  - SLA Criteria - Allows a designer to communicate the 'things that
> should be
>    measured, and there threshold targets; in order to calculate the
> application's
>    availability for your service level agreement.
>  - Info - A place to communicate versioning info about application
> components,
>    show and remotely control debug/logging levels, remote restart a
> application, etc.
>
>  So these tables define the 'things' that should or could be communicated,
>  now its up to the application designers to populate these tables... put
>  rows in the tables for each of their applications 'needs', and that will
> be different for each application.
>
>  So if I understand correctly, you have defined a general MIB that
> contains entries for each of these types of information?
>

Correct.

>  And applications themselves decide which information (that relates to
> their performance) they want to expose in this MIB structure?
>

Correct.

> How do applications actual insert their values into the MIB (i.e., how do
> they add a row to the appropriate MIB branch)? Does this happen via the
> AgentX protocol? If so, do all individual applications need to be extended
> with an AgentX component,
>

Its done by invoking the code that implements that MIB.
In my case, _my_ MIB compiler generated stubs (for Perl or Java) that can
be compiled into an application.  And those stubs interact with the
net-snmp (snmpd) master agent using AgentX protocol.  In essence,
each managed application becomes a sub-agent of snmpd.

So at run-time, the application adds/removes entries in each table
if/when they require.  Underneath the covers, yes, the fact that
a row is created/removed is communicated via AgentX to the master
so the master knows' which sub-agent to ask, to retrieve the actual
live data.
When the app needs to update a row's columns (via accessor stubs)
that will in turn updates the values, and decides whether or not to
issue a related trap.

or do you make use of a single AgentX module per device that takes care of
> MIB-related operations for all applications running on that device?
>

No, because I wanted the tables to be populated dynamically based on
whether the application was running or not, and to also enable sending of
traps when the apps start and stop.

That facilitates the concept of 'the application knows and exposes
information about itself'.
There doesn't have to be another app/module that 'knows' about the apps that
may be present. It ensures that the information only exists in one place,
the application itself, and prevents problems that occur when a 'proxy'
module/app sits in-between. (One of our developers has done that, and we
now have the situations where:
a) the proxy is running (and reporting the app is running), but the app
isn't running
b) the app is running but the proxy isn't, so it incorrectly reports the
app as 'up'.
etc.

  And then applications also have application specific 'other stuff' that
> they
>  may want to expose, that doesn't fit into the above generic tables.
>
>
> And where exactly do you put this information and make it addressable via
> SNMP? Is this done via an application-specific MIB?
>

Correct.  Those are 'other' MIBs/stubs that an applications integrate.

  So as you browse a device, these tables let you 'find out what different
> things may be running, how they are interconnected, how well they are
> running (or not), and provide capacity planning info (and alarms on
> 'potential
>  capacity problems so you don't actually have to do capacity planning
>  until you get 'close' to your 'gee maybe I should start paying attention'
>  usage level.
>
>
> May I ask which kind of NMS you are using to gather, store and handle the
> SNMP-mediated information? And does an administrator need to manually
> initiate the browsing of a device, or is this performed automatically by
> the NMS? Remember that we would like to automate the performance data
> collection as much as possible due to the dynamic nature of our back-end
> infrastructure (i.e., devices might spawn and be destroyed at run-time, and
> we would like the performance monitoring to start/stop automatically when a
> device is spawned/destroyed).
>

We have a number of different NMSs; purchased, open-source, and home-grown.
In the end, it doesn't matter 'who collects (and displays) the data',
but that the data is 'available for collection'!

Because my MIB has one of those 'dependency' entries corresponding to
an application's concept of 'myself'.  When the app starts/stops,
it's 'myself' entry changes state.  (Since "I depend on myself running
in order to provide _my_ functionality!")
And because a trap is issued
whenever a 'dependency' changes state... an NMS can then reactively
start/stop whatever it needs to do, when it receives those indications.

How your NMS reacts: start/stop performance collection, remotely reboot
the box that 'should be' running a 'dependent application, etc. is up to
you.

Oh, and part of that 'dependency' table, communicates 'who I am' and
'who I depend on' so that you _can_ follow that dependency chain
(programatically)
and perform those automated diagnostics, restarts, etc.

   So I think this design (and implementation) covers the kind of stuff you
>  are thinking about (and more).
>
>  I agree! Your MIB seems fairly elaborate and is likely to exceed our
> (initial) needs.
>
>   Unfortunately, my implementation is
> (at the moment) company proprietary; but I can still 'talk' about it.
>
>  I would like to take this opportunity to thank you once more for replying
> to our question and for sharing your knowledge and experience with us. If
> any of my follow-up questions would be too detailed (given the proprietary
> nature of your implementation), please feel free to simply state so and to
> limit yourself to answering only those questions that you feel comfortable
> about.
>

I'll let you know.  :-)

------------------------------------------------------------------------------
November Webinars for C, C++, Fortran Developers
Accelerate application performance with scalable programming models. Explore
techniques for threading, error checking, porting, and tuning. Get the most 
from the latest Intel processors and coprocessors. See abstracts and register
http://pubads.g.doubleclick.net/gampad/clk?id=60136231&iu=/4140/ostg.clktrk

_______________________________________________
Net-snmp-users mailing list
Net-snmp-users@lists.sourceforge.net
Please see the following page to unsubscribe or change other options:
https://lists.sourceforge.net/lists/listinfo/net-snmp-users

Re: Using net-snmp to monitor a dynamic cloud-based back-end

Reply via email to