Re: [Ganglia-general] [Beowulf] Performance metrics & reporting

Bernard Li Tue, 08 Apr 2008 16:39:09 -0700

Oh and Jesse, didn't you say Scyld is still including Ganglia 2.5.7 in
their distribution or something?  Maybe this is a good opportunity for
them to update to the latest and greatest :-)


Cheers,

Bernard

On Tue, Apr 8, 2008 at 4:37 PM, Bernard Li <[EMAIL PROTECTED]> wrote:
> I never found gmond to be resource intensive.  Sure there has been
>  memory leaks (that have since been fixed) but I doubt this is what
>  Donald is referring to.
>
>  Perhaps he is talking about gmetad.  Most users know that if they have
>  a lot of hosts (1000+) they are monitoring, they need to do something
>  about the RRDTool I/O -- the most common solution is to put the rrds
>  in tmpfs.  We talked about making this more transparent to users in
>  future releases.  After such a workaround is implemented, gmetad does
>  not use a lot of resource either.
>
>  So I'd like to ask the Ganglia community -- do you guys find Ganglia
>  to be a resource hog?
>
>  Cheers,
>
>  Bernard
>
>
>
>  ---------- Forwarded message ----------
>  From: Donald Becker <[EMAIL PROTECTED]>
>  Date: Tue, Apr 8, 2008 at 2:32 PM
>  Subject: Re: [Beowulf] Performance metrics & reporting
>  To: Beowulf Mailing List <[EMAIL PROTECTED]>
>
>
>
>   On Tue, 8 Apr 2008, Jesse Becker wrote:
>   > Gerry Creager wrote:
>   > > Yeah, we're using Ganglia.  It's a good start, but not complete...
>   >
>   > The next version of Ganglia (3.1.x) is being written to be much more easy 
> to
>   > customize, both on the backend metric collection by allowing custom 
> modules
>   > for gmond, and on the frontend with some changes to make custom
>  reports easier
>   > to write.  I've written a small pair of routines to monitor SGE jobs, for
>   > example, and it could easily be extended to watch multiple queues.
>
>   It might be useful to consider what we did in the Scyld cluster system.
>
>   We found that a significant number of customers (and potential customers)
>   were using Ganglia, or were planning on using it.  But those that were
>   intensively using it complained about its resource usage.  In some cases
>   it was using 20% of CPU time.
>
>   We have a design philosophy of running nothing on the compute nodes except
>   for the application.  A pure philosophy doesn't always fit with a working
>   system, so from the beginning we built in a system called BeoStat (Beowulf
>   State, Status and Statistics).  To keep the "pure" appearance of our
>   system we initially hid this in BeoBoot, so that it started immediately at
>   boot time, underneath the rest of the system.
>
>   How are these two related?  To implement Ganglia we just chopped out the
>   underlying layers (which spend a huge amount of time generating then
>   parsing XML), and generate the final XML directly from the BeoStat
>   statistics already combined on the master.
>
>   This gave us the best of both worlds: no additional load on compute nodes,
>   lower network load, much higher efficiency, and easy scalability to
>   thousands of nodes from BeoStat, and the ability to log and summarize
>   historical data, good-looking displays and ability to monitor multiple
>   clusters from Ganglia.
>
>
>   It might be useful to look at the design of Beostat.  It's superficially
>   similar to other systems out there, but we made decisions that are
>   much different than others -- ones that most consider wrong until they
>   understand their value.
>
>   Some of them are:
>   It's not extensible
>   It reports values in a binary structure
>   It's UDP unicast to a single master machine
>   It has no liveness criteria
>   The receive side stores only current values
>
>   The first one is the most uncommon.  Beostat is not extensible.  You can't
>   add in your own stat entries.  You can't have it report stats from 64
>   cores.  It reports what it reports... that's it.
>
>   Why is this important?  We want to deploy cluster systems.  Not build a
>   one-off cluster.  We want the stats to be the same on every system we
>   deploy.  We want every tool that uses the stats to be able to know that
>   they will be available.  Once you allow and encourage a customizable
>   system, every deployment will be different.  Tools won't work out of the
>   box, and there is a good chance that tools will require mutually
>   incompatible extensions.
>
>   Deploying a fixed-content stat system also enforces discipline.  We
>   carefully considered what we need to report, and how to report it.  In
>   contrast look at Ganglia's stats.  Why did they choose the set they did?
>   Pretty clearly because the underlying kernel reported those values.  What
>   do they mean?  The XML DTD doesn't tell you.  You have to look at the
>   source code.  What do you use them for?  They don't know, they'll figure
>   it out later.
>
>   People next question "but what if I have 8/16/64 cores?  You only have 2
>   [[ now 4 ]] CPU stat slots."  The answer is similar to above -- what are
>   you going to do with all of that data?  The answer is summarize it before
>   using it.  We just summarize it on the reporting side.  We report that
>   there are N CPUs, the overall load average, and then summarize the CPU
>   cores as groups (e.g. per socket).  For network adapters we report e.g.
>   eth0, eth1, eth2 and "all the rest added together".
>
>   Once we chose a fixed set of stats, we had a ability to make it a fixed
>   size report.  It could be reported as binary values, with
>   any per-kernel-version variation done on the sending side.
>
>   Having a small, limited-size report meant that it fit in a single network
>   packet.  That makes the network load predictable and very scalable.
>   It gave us the opportunity to effectively use UDP to report, without
>   fragmenting into multiple frames.  UDP means that we can switch to and
>   from multicast without changes, even changing in real time.
>
>   A fixed-size frame makes the receiving side simple as well.  We just
>   receive the incoming network frame into memory.  No parsing, no
>   translation, no interpretation.  We actually do a tiny bit more, such as
>   putting on a timestamp, but overall the receiving process does only
>   trivial work.  This is important when the receiver is the master, which
>   could end up with the heaviest workload if the system isn't carefully
>   designed.  We've support 1000+ machines for years, and are now designing
>   around 10K nodes.
>
>   We actually do a tiny bit more when storing a stat packet -- we add a
>   timestamp.  We can use this to figure out the time skew between the master
>   and computer node, verify the network reliability/load, and to decide if
>   the node is live.
>
>   This isn't the only liveness test.  It's not even the primary liveness
>   test.  We document it as only a guideline.  Developers should use the
>   underlying cluster management system to decide if a node has died.  But if
>   there hasn't been a recent report, a scheduler should avoid using the node.
>   Classifying the world into Live and Dead is wrong.  It's at least Live,
>   Dead and Schrodinger's Still-boxed Cat
>
>   Finally, this is a State, Status and Statistics system.  It's a
>   scoreboard, not a history book.  We keep only two values, the last two
>   received.  That gives us the current info, and the ability to calculate
>   rate.  If any subsystem needs older values (very few do) it can pick a
>   logging, summarization and coalescing approach of its own.
>
>
>   We made many other innovative architectural decisions when designing the
>   system, such as publishing the stats as a read-only shared memory version.
>   But this are less interesting because no one disagrees with them ;-).
>
>
>   --
>   Donald Becker                           [EMAIL PROTECTED]
>   Penguin Computing / Scyld Software
>   www.penguincomputing.com                www.scyld.com
>   Annapolis MD and San Francisco CA
>
>
>
>   _______________________________________________
>   Beowulf mailing list, [EMAIL PROTECTED]
>   To change your subscription (digest mode or unsubscribe) visit
>  http://www.beowulf.org/mailman/listinfo/beowulf
>

-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
_______________________________________________
Ganglia-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/ganglia-general

Re: [Ganglia-general] [Beowulf] Performance metrics & reporting

Reply via email to