Re: [Ganglia-general] Fwd: [Beowulf] Performance metrics & reporting

aurbain Wed, 09 Apr 2008 07:06:51 -0700

I've got 900 hosts across a dozen clusters, gmetad tmpfs rrd set =
350meg.  CPU 1 min load rarely above 1.


would be fine for gmetad to setup tmpfs and manage dataset backups for me.



Bernard Li wrote:
> I never found gmond to be resource intensive.  Sure there has been
> memory leaks (that have since been fixed) but I doubt this is what
> Donald is referring to.
> 
> Perhaps he is talking about gmetad.  Most users know that if they have
> a lot of hosts (1000+) they are monitoring, they need to do something
> about the RRDTool I/O -- the most common solution is to put the rrds
> in tmpfs.  We talked about making this more transparent to users in
> future releases.  After such a workaround is implemented, gmetad does
> not use a lot of resource either.
> 
> So I'd like to ask the Ganglia community -- do you guys find Ganglia
> to be a resource hog?
> 
> Cheers,
> 
> Bernard
> 
> ---------- Forwarded message ----------
> From: Donald Becker <[EMAIL PROTECTED]>
> Date: Tue, Apr 8, 2008 at 2:32 PM
> Subject: Re: [Beowulf] Performance metrics & reporting
> To: Beowulf Mailing List <[EMAIL PROTECTED]>
> 
> 
> 
>  On Tue, 8 Apr 2008, Jesse Becker wrote:
>  > Gerry Creager wrote:
>  > > Yeah, we're using Ganglia.  It's a good start, but not complete...
>  >
>  > The next version of Ganglia (3.1.x) is being written to be much more easy 
> to
>  > customize, both on the backend metric collection by allowing custom modules
>  > for gmond, and on the frontend with some changes to make custom
> reports easier
>  > to write.  I've written a small pair of routines to monitor SGE jobs, for
>  > example, and it could easily be extended to watch multiple queues.
> 
>  It might be useful to consider what we did in the Scyld cluster system.
> 
>  We found that a significant number of customers (and potential customers)
>  were using Ganglia, or were planning on using it.  But those that were
>  intensively using it complained about its resource usage.  In some cases
>  it was using 20% of CPU time.
> 
>  We have a design philosophy of running nothing on the compute nodes except
>  for the application.  A pure philosophy doesn't always fit with a working
>  system, so from the beginning we built in a system called BeoStat (Beowulf
>  State, Status and Statistics).  To keep the "pure" appearance of our
>  system we initially hid this in BeoBoot, so that it started immediately at
>  boot time, underneath the rest of the system.
> 
>  How are these two related?  To implement Ganglia we just chopped out the
>  underlying layers (which spend a huge amount of time generating then
>  parsing XML), and generate the final XML directly from the BeoStat
>  statistics already combined on the master.
> 
>  This gave us the best of both worlds: no additional load on compute nodes,
>  lower network load, much higher efficiency, and easy scalability to
>  thousands of nodes from BeoStat, and the ability to log and summarize
>  historical data, good-looking displays and ability to monitor multiple
>  clusters from Ganglia.
> 
> 
>  It might be useful to look at the design of Beostat.  It's superficially
>  similar to other systems out there, but we made decisions that are
>  much different than others -- ones that most consider wrong until they
>  understand their value.
> 
>  Some of them are:
>   It's not extensible
>   It reports values in a binary structure
>   It's UDP unicast to a single master machine
>   It has no liveness criteria
>   The receive side stores only current values
> 
>  The first one is the most uncommon.  Beostat is not extensible.  You can't
>  add in your own stat entries.  You can't have it report stats from 64
>  cores.  It reports what it reports... that's it.
> 
>  Why is this important?  We want to deploy cluster systems.  Not build a
>  one-off cluster.  We want the stats to be the same on every system we
>  deploy.  We want every tool that uses the stats to be able to know that
>  they will be available.  Once you allow and encourage a customizable
>  system, every deployment will be different.  Tools won't work out of the
>  box, and there is a good chance that tools will require mutually
>  incompatible extensions.
> 
>  Deploying a fixed-content stat system also enforces discipline.  We
>  carefully considered what we need to report, and how to report it.  In
>  contrast look at Ganglia's stats.  Why did they choose the set they did?
>  Pretty clearly because the underlying kernel reported those values.  What
>  do they mean?  The XML DTD doesn't tell you.  You have to look at the
>  source code.  What do you use them for?  They don't know, they'll figure
>  it out later.
> 
>  People next question "but what if I have 8/16/64 cores?  You only have 2
>  [[ now 4 ]] CPU stat slots."  The answer is similar to above -- what are
>  you going to do with all of that data?  The answer is summarize it before
>  using it.  We just summarize it on the reporting side.  We report that
>  there are N CPUs, the overall load average, and then summarize the CPU
>  cores as groups (e.g. per socket).  For network adapters we report e.g.
>  eth0, eth1, eth2 and "all the rest added together".
> 
>  Once we chose a fixed set of stats, we had a ability to make it a fixed
>  size report.  It could be reported as binary values, with
>  any per-kernel-version variation done on the sending side.
> 
>  Having a small, limited-size report meant that it fit in a single network
>  packet.  That makes the network load predictable and very scalable.
>  It gave us the opportunity to effectively use UDP to report, without
>  fragmenting into multiple frames.  UDP means that we can switch to and
>  from multicast without changes, even changing in real time.
> 
>  A fixed-size frame makes the receiving side simple as well.  We just
>  receive the incoming network frame into memory.  No parsing, no
>  translation, no interpretation.  We actually do a tiny bit more, such as
>  putting on a timestamp, but overall the receiving process does only
>  trivial work.  This is important when the receiver is the master, which
>  could end up with the heaviest workload if the system isn't carefully
>  designed.  We've support 1000+ machines for years, and are now designing
>  around 10K nodes.
> 
>  We actually do a tiny bit more when storing a stat packet -- we add a
>  timestamp.  We can use this to figure out the time skew between the master
>  and computer node, verify the network reliability/load, and to decide if
>  the node is live.
> 
>  This isn't the only liveness test.  It's not even the primary liveness
>  test.  We document it as only a guideline.  Developers should use the
>  underlying cluster management system to decide if a node has died.  But if
>  there hasn't been a recent report, a scheduler should avoid using the node.
>  Classifying the world into Live and Dead is wrong.  It's at least Live,
>  Dead and Schrodinger's Still-boxed Cat
> 
>  Finally, this is a State, Status and Statistics system.  It's a
>  scoreboard, not a history book.  We keep only two values, the last two
>  received.  That gives us the current info, and the ability to calculate
>  rate.  If any subsystem needs older values (very few do) it can pick a
>  logging, summarization and coalescing approach of its own.
> 
> 
>  We made many other innovative architectural decisions when designing the
>  system, such as publishing the stats as a read-only shared memory version.
>  But this are less interesting because no one disagrees with them ;-).
> 
> 
>  --
>  Donald Becker                           [EMAIL PROTECTED]
>  Penguin Computing / Scyld Software
>  www.penguincomputing.com                www.scyld.com
>  Annapolis MD and San Francisco CA
> 
> 
> 
>  _______________________________________________
>  Beowulf mailing list, [EMAIL PROTECTED]
>  To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
> 
> -------------------------------------------------------------------------
> This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
> Don't miss this year's exciting event. There's still time to save $100. 
> Use priority code J8TL2D2. 
> http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
> _______________________________________________
> Ganglia-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/ganglia-general
> 

-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
_______________________________________________
Ganglia-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/ganglia-general

Re: [Ganglia-general] Fwd: [Beowulf] Performance metrics & reporting

Reply via email to