Hi Brad,

 that seems to be a pretty useful move. Seems it is time that I really start 
looking closely at 3.1.x

Cheers
Martin
----------------------------------------------------
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

----- Original Message ----
> From: Brad Nicholes <[EMAIL PROTECTED]>
> To: [EMAIL PROTECTED]; ganglia-general@lists.sourceforge.net
> Sent: Tuesday, December 18, 2007 11:44:45 PM
> Subject: [Ganglia-developers] Moving all built-in metrics to metric modules...
> 
>    I just committed a rather substantial patch to Ganglia 3.1.0
> trunk
> 
 which will affect the way that gmond 3.1.x is deployed.  I am
> posting
> 
 this to both the developer list and the general list so that all will
> be
> 
 aware of the changes and why they are important.  The primary
> purpose
> 
 for the patch was to remove all of the built in metrics out of the
> gmond
> 
 binary and allow them to be built as loadable modules.  The
> following
> 
 is a more detailed list of what has changed.  Hopefully from a
> user
> 
 perspective, gmond will continue to work as it has in the past.  But
> going
> 
 forward, it will be much more flexible with regards to the core set
> of
> 
 metrics.
> 
> * All built-in metrics have been removed from the gmond binary
>   - A new set of core metric modules have been created that
> represent
> 
 the same set metrics that gmond has always gathered.  These new
> core
> 
 modules are mod_cpu.so, mod_disk.so, mod_load.so, mod_mem.so,
> mod_net.so,
> 
 mod_proc.so and mod_sys.so.  Each of these modules is basically
> a
> 
 wrapper around the metric functions that exist in libmetrics. 
> Being
> 
 wrappers, they still make the same metric function calls as have always
> been
> 
 made.  And since libmetrics contains all of the platform specific
> metric
> 
 code, the metric function calls made by the core modules will
> continue
> 
 to do the right thing for all of the platforms that have
> been
> 
 previously supported.  
>  - There is also an extra module called core_metrics which contains
> the
> 
 heartbeat, location and gexec metrics.  Even though this module
> could
> 
 be dynamically loaded in the same manner as the others, it is
> always
> 
 statically linked simply because gmond would not be able to
> function
> 
 properly without these metrics so there is no real reason to allow
> these
> 
 metrics to be dynamically loaded.
>   - Some additional configuration has been added to the
> gmond.conf
> 
 file.  Because the core metrics are now implemented as modules,
> this
> 
 requires a module configuration block that instructs gmond to load
> each
> 
 module.  A set of module blocks has been added to the default
> gmond.conf
> 
 file.
> 
> * All metric specific metadata definitions have been removed
> from
> 
 protocol.x
>   - With the  refactoring of the XDR data and removal of the
> builtin
> 
 metrics, there is no longer any need for XDR to have intimate
> knowledge
> 
 of the core metrics.  Therefore the metric structure array and enum
> have
> 
 been removed and are now part of the core metric modules themselves.
> 
> * --enable-static-build statically links the core metric modules
>   - Building gmond statically will statically link not only APR,
> expat
> 
 and libconfuse, it will also statically link all of the core
> metric
> 
 modules into the gmond binary.  The result should be a gmond binary
> that
> 
 looks and feels just like the old 3.0.x statically linked gmond
> binary.
> 
  The one exception is that a module statement is still required in
> the
> 
 gmond.conf file.  The difference between the module
> configuration
> 
 block for dynamically loaded modules and the module blocks for
> statically
> 
 linked modules is whether or not a path to the .so is included. 
> The
> 
 configure script and makefiles have been modified to
> detect
> 
 --enable-static-build and build the default gmond.conf file appropriately.
> 
> * --enable-static-build + --enable-python statically links the
> python
> 
 module
>   - One of the downsides of building gmond 3.1.x statically was
> that
> 
 doing so would disable all of the dynamically loadable module
> capability.
> 
  The reason for this is the need for both gmond and the
> pluggable
> 
 modules to dynamically link with libapr1.  However, if
> both
> 
 --enable-static-build and --enable-python are specified during configure, a
> gmond
> 
 binary will be built with mod_python statically linked.  This
> provides
> 
 gmond with the ability to continue to load and run python metric modules
> in
> 
 the same manner as the non-static build.  In other words, even
> though
> 
 statically linking gmond will disable pluggable C interface
> modules,
> 
 python pluggable modules will still continue to work.
> 
> * All metrics carry a group designation
>   - Now that all metrics have been implemented as loadable modules,
> the
> 
 metrics have also been assigned to groups.  The XML that is
> produced
> 
 by gmond and gmetad will carry an  tag
> that
> 
 defines which group each metric belongs to.  This will allow the web
> front
> 
 end to be enhanced to filter metrics so that they can be display
> by
> 
 group rather than all metric graphs appearing on the same page.
> 
> 
> These changes should make gmond much more flexible when it comes
> to
> 
 extending or replacing not only the core metrics but also new metrics. 
> I
> 
 have attached the wish list that was compiled a couple of months
> ago
> 
 which updates the items that I consider to be done.  As I mentioned
> at
> 
 our meet-up a few weeks ago, we need to identify which of the
> remaining
> 
 items must be addressed before shipping 3.1.0 and get those
> completed.
> 
  I would like to see us ship a 3.1.0 release as soon as possible.  
> 
> Brad
> 
> 
> 
> -----Inline Attachment Follows-----
> 
> Done
> ------------------
> - C module interface as DSO
> - mod_python Python module interface
> - Dynamically link libraries like expat, apr, libconfuse
> - Add TITLE attribute to the XDR data to communicate a human
> readable
> 
 name
> - Add a GROUP attribute to the XDR data
>     This would allow metrics to declare the category that they
> belong
> 
 to. The 
>     category should be added at the metric definition level and not
> in
> 
 the .conf file.
> - Reimplement the built in metrics as C interface modules
> - A cleaner XDR encoding:
>     The current encoding scheme embeds too much information about
> which
> 
 metrics
>     gmond collects.  The encoding scheme should treat all metrics
> the
> 
 same: as
>     just "a metric".  The encoding should not care if the metric is 
>     metric_cpu_speed, metric_swap_total or a user-defined
> "gmetric"
> 
 one.
> - Flexible method of adding extra metric metadata.
>     We could include extra metadata, not just "alias"/"title". 
> For
> 
 example, some
>     metrics have a natural minimum and maximum value.  Perhaps
> coming
> 
 up with an
>     extendable way of encoding metric metadata so future changes can
> be
> 
 included
>     without loosing backwards compatibility.
> - Re-organization of RPM packages (libganglia, gmond-python ?)
> 
> 
> GMond To Do
> ------------------------
> - Gmond module repository
> - Implement a perl module interface
> - Implement a PHP module interface
> - Implement a Ruby module interface
> - Metric packing:
>     Simply that a UDP packet can contain multiple metrics (using
> the
> 
 usual XDR
>     stream decoding) up to the size of a UDP packet.  This would
> help
> 
 reduce
>     the overheads when sending many metric updates concurrently. 
> It
> 
 also
>     preserves the current gmond behaviour where it sends metric
> updates
> 
 in
>     a single UDP packet.
> - Support for counters (metrics with +ve slope)
>     This shouldn't require much work (from memory, make sure
> the
> 
 slope-type
>     information is preserved and patch gmetad to create RRD files
> with
> 
 the
>     correct options).  Currently Ganglia doesn't actually
> support
> 
 custom
>     counter metrics, which is an awkward limitation.
> - gmond switching to a non-blocking IO model.
>     If there's a large number of metric updates then gmond must
> process
> 
 them
>     "quickly" or they will be lost.  If this happens whilst gmond
> is
> 
 sending XML
>     data to gmetad there's may be a delay, increasing the risk
> of
> 
 metric
>     update messages being lost.  Switching to a non-blocking IO
> model
> 
 would allow
>     gmond to respond preferentially to the incoming UDP messages.
> -* Remove the 4T limit on ganglia metric results
> -* Modify all byte count metric to 8 bytes ints
> 
> GMetad To Do
> ------------------------------
> - Support for new RRDTool which allows graphs to have dynamic sizes
> - Gilad's stacked graphs
> - Changing the units of default metrics to their base
>     For example disk_free's base unit should be bytes, not GB
> as
> 
 rrdtool will
>     automatically append G,M,K etc.)
> - Better support for bigger less frequent updates 
>     one packet every 20 seconds per host for all data?
> - Multi PB disk limit
> - Better on disk RRD perf (tmpfs is an OK workaround)
> -* Name RRD directories based on UUID generated by client gmond 
>     has of MAC address? something else? So that renaming
> hosts,
> 
 updating DNS or
>     hosts files don't result in history for the phyiscal gmond
> client
> 
 being lost.
> - Integration of gexec/authd ?  
> - Expand gstat nodelist parameter query options (i.e. return all hosts
> with <10% iowait, etc.)
> - Interface stats in bits?  Self awareness of interface capablity for %
> util stats for network.
> - Something like a unique per-gmond instance identifier
>     To help with multi-homing and DNS issues and so the IP address
> is
> 
 no 
>     longer the index key. There was discussion of this under
> the
> 
 subject 
>     "Overriding hostname" on the Ganglia-general list.
> - Give some metrics priority and have them updated more frequently
> in
> 
 their RRDs than others.
> - Allow for some sort of in memory RRD (never written to disk) as
> an
> 
 alternative storage for very extreme cases.
> - Let the users manage different IO bound pools for their metrics
>     For extreme cases one based on tmpfs. So that they can be
> tied
> 
 correctly 
>     to the right kind of storage IO capabilities for the
> frequency
> 
 needed.
> - Add more memory metrics 
>     slab, buffers, dirty, writeback, cache_clean  (= cached
> -
> 
 dirty+writeback)), mapped, free
> 
> Web interface
> -------------------------------
> - Numerous custom graphs enhancements (Alex Balk, Timothy
> Witham,
> 
 others)
> - Web frontend face lift
> - Mouse over result graphs
> - Default cluster view uses text-only per host squares 
>     loading 1700 little graphs chews too much browser
> - Better icons.
>     The current highly-compressed JPEG files for the icons
> look
> 
 horrible!
>     Line-art perhaps suffers worst from JPEG compression
> artifacts.
> 
  Could we not
>     use either PNGs or (preferably) SVG?
> 
> - Add an option to allow switching to SVG in-line RRDTool graphs.
>     This should be pretty easy to add as a config option.  I
> think
> 
 support for
>     SVG in current browsers is now "good enough".  A half-way
> modern
> 
 version of
>     RRDTool can generate SVG versions of the graphs, which should
> look
> 
 much
>     better.
> 
> - Have some standard way of describing custom graphs.
>     There currently isn't a standard way of producing custom
> graphs;
> 
 "custom"
>     here means adding support for host-specific and
> cluster-specific
> 
 graphs and
>     also some framework for describing those custom graphs.  I have a
>     solution, that (at least) has merit in both existing and
> working.
> 
  Perhaps it
>     isn't ideal, but the Ganglia web front-end should provide at
> least
> 
 some
>     standard hooks if not an actual framework.
> 
> - Have the option to switch off displaying all the
> single-metric
> 
 graphs.
>     If you have ~300 metrics, the little graphs at the page bottom
> are
> 
 all but
>     useless.  They slow down the loading of the page without
> adding
> 
 much insight.
>     (I have a simple patch that allows a user to choose whether
> they
> 
 want to see
>     these graphs.)
> 
> - Fix the pie-chart-generating code.
>     The current pie-chart code is a bit ugly and can plot
> things
> 
 incorrectly
>     under certain circumstances.  There must be some nicer
> graph
> 
 plotting
>     packages out there...
> 
> 
> 
> 
> 
> 
> 



-------------------------------------------------------------------------
SF.Net email is sponsored by:
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services
for just about anything Open Source.
http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace
_______________________________________________
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general

Reply via email to