Re: Using net-snmp to monitor a dynamic cloud-based back-end

Maarten Wijnants Wed, 06 Nov 2013 08:29:00 -0800

Dear Fulko Hew,


On 5/11/2013 14:42, Fulko Hew wrote:

<snip>
I have instituted exactly this philosophy on the applicationdevelopers within my company.
After a few decades of creating and supporting applications and devices,
I came to the conclusion that standalone applications should not be looked
at any differently than 'embedded' applications in dedicated devices(eg. routers).
After all, even a router is 'just another application'.

This indeed largely coincides with our vision. We would like to collectgranular (performance) information about individual servers/applicationsrunning on particular devices instead of limiting ourselves to gatheringglobal information about the status of devices in their totality. Thisis due to the multi-service nature of our cloud-based back-end, where asingle back-end device is able to host multiple highly heterogeneousservices.

So I then abstracted out the 'common items' that all applications have
and bundled them into a MIB (and implemented a Perl and Java library
that allow our apps (written in those languages only at the moment)
to provide and receive info (snmp get and set) and send traps for
noteworthy items.  And these libraries interact with Net-SNMP via AgentX


Thanks for sharing this information!

The abstracted stuff fell into 5 categories:
- Dependencies - All apps 'need' stuff, like files/connections/otherapps, etc
  as inputs and outputs.  Ie. Things 'this app' depends on to work, and if
  a dependency isn't being fulfilled, the app will not work as needed.
This table also holds simple usage and response stats on each'dependency'along with traps for up/down, usage threshold alarms and responsetime alarms.
- Capacity/resource - Where 'dependencies' are a 'comms' kind of thing
the 'capacity' table, allows the app developers to communicateinfo/statuson 'consumable' things an app needs or consumes (buffer pools,memory, sockets, etc)And it defines traps for capacity alarms and when the consumptiongoes back
  to normal.
- Response time - Even though the dependency table has response time info
  it is very simple.  This table allows you to report _any_ kind of thing
  that you want to comminicate the response time of... with thresholds and
  good/bad traps.
- SLA Criteria - Allows a designer to communicate the 'things thatshould bemeasured, and there threshold targets; in order to calculate theapplication's
  availability for your service level agreement.
- Info - A place to communicate versioning info about applicationcomponents,show and remotely control debug/logging levels, remote restart aapplication, etc.
So these tables define the 'things' that should or could be communicated,
now its up to the application designers to populate these tables... put
rows in the tables for each of their applications 'needs', and that will
be different for each application.

So if I understand correctly, you have defined a general MIB thatcontains entries for each of these types of information? Andapplications themselves decide which information (that relates to theirperformance) they want to expose in this MIB structure? How doapplications actual insert their values into the MIB (i.e., how do theyadd a row to the appropriate MIB branch)? Does this happen via theAgentX protocol? If so, do all individual applications need to beextended with an AgentX component, or do you make use of a single AgentXmodule per device that takes care of MIB-related operations for allapplications running on that device?

And then applications also have application specific 'other stuff'that they
may want to expose, that doesn't fit into the above generic tables.

And where exactly do you put this information and make it addressablevia SNMP? Is this done via an application-specific MIB?

So as you browse a device, these tables let you 'find out what different
things may be running, how they are interconnected, how well they are

running (or not), and provide capacity planning info (and alarms on'potential

capacity problems so you don't actually have to do capacity planning
until you get 'close' to your 'gee maybe I should start paying attention'
usage level.

May I ask which kind of NMS you are using to gather, store and handlethe SNMP-mediated information? And does an administrator need tomanually initiate the browsing of a device, or is this performedautomatically by the NMS? Remember that we would like to automate theperformance data collection as much as possible due to the dynamicnature of our back-end infrastructure (i.e., devices might spawn and bedestroyed at run-time, and we would like the performance monitoring tostart/stop automatically when a device is spawned/destroyed).

So I think this design (and implementation) covers the kind of stuff you
are thinking about (and more).

I agree! Your MIB seems fairly elaborate and is likely to exceed our(initial) needs.

Unfortunately, my implementation is
(at the moment) company proprietary; but I can still 'talk' about it.

I would like to take this opportunity to thank you once more forreplying to our question and for sharing your knowledge and experiencewith us. If any of my follow-up questions would be too detailed (giventhe proprietary nature of your implementation), please feel free tosimply state so and to limit yourself to answering only those questionsthat you feel comfortable about.

Fulko Hew


Best regards,
Maarten Wijnants

--

dr. Maarten Wijnants
Expertise centre for Digital Media (EDM) - Hasselt University
Wetenschapspark 2 - B3590 Diepenbeek
Tel: +32 (0)11 268411 - Mobile: +32 (0)497 341848
Fax: +32 (0)11 268499
E-mail:maarten.wijna...@uhasselt.be
Web:http://research.edm.uhasselt.be/~mwijnants/

------------------------------------------------------------------------------
November Webinars for C, C++, Fortran Developers
Accelerate application performance with scalable programming models. Explore
techniques for threading, error checking, porting, and tuning. Get the most 
from the latest Intel processors and coprocessors. See abstracts and register
http://pubads.g.doubleclick.net/gampad/clk?id=60136231&iu=/4140/ostg.clktrk

_______________________________________________
Net-snmp-users mailing list
Net-snmp-users@lists.sourceforge.net
Please see the following page to unsubscribe or change other options:
https://lists.sourceforge.net/lists/listinfo/net-snmp-users

Re: Using net-snmp to monitor a dynamic cloud-based back-end

Reply via email to