I've got 900 hosts across a dozen clusters, gmetad tmpfs rrd set = 350meg. CPU 1 min load rarely above 1.
would be fine for gmetad to setup tmpfs and manage dataset backups for me. Bernard Li wrote: > I never found gmond to be resource intensive. Sure there has been > memory leaks (that have since been fixed) but I doubt this is what > Donald is referring to. > > Perhaps he is talking about gmetad. Most users know that if they have > a lot of hosts (1000+) they are monitoring, they need to do something > about the RRDTool I/O -- the most common solution is to put the rrds > in tmpfs. We talked about making this more transparent to users in > future releases. After such a workaround is implemented, gmetad does > not use a lot of resource either. > > So I'd like to ask the Ganglia community -- do you guys find Ganglia > to be a resource hog? > > Cheers, > > Bernard > > ---------- Forwarded message ---------- > From: Donald Becker <[EMAIL PROTECTED]> > Date: Tue, Apr 8, 2008 at 2:32 PM > Subject: Re: [Beowulf] Performance metrics & reporting > To: Beowulf Mailing List <[EMAIL PROTECTED]> > > > > On Tue, 8 Apr 2008, Jesse Becker wrote: > > Gerry Creager wrote: > > > Yeah, we're using Ganglia. It's a good start, but not complete... > > > > The next version of Ganglia (3.1.x) is being written to be much more easy > to > > customize, both on the backend metric collection by allowing custom modules > > for gmond, and on the frontend with some changes to make custom > reports easier > > to write. I've written a small pair of routines to monitor SGE jobs, for > > example, and it could easily be extended to watch multiple queues. > > It might be useful to consider what we did in the Scyld cluster system. > > We found that a significant number of customers (and potential customers) > were using Ganglia, or were planning on using it. But those that were > intensively using it complained about its resource usage. In some cases > it was using 20% of CPU time. > > We have a design philosophy of running nothing on the compute nodes except > for the application. A pure philosophy doesn't always fit with a working > system, so from the beginning we built in a system called BeoStat (Beowulf > State, Status and Statistics). To keep the "pure" appearance of our > system we initially hid this in BeoBoot, so that it started immediately at > boot time, underneath the rest of the system. > > How are these two related? To implement Ganglia we just chopped out the > underlying layers (which spend a huge amount of time generating then > parsing XML), and generate the final XML directly from the BeoStat > statistics already combined on the master. > > This gave us the best of both worlds: no additional load on compute nodes, > lower network load, much higher efficiency, and easy scalability to > thousands of nodes from BeoStat, and the ability to log and summarize > historical data, good-looking displays and ability to monitor multiple > clusters from Ganglia. > > > It might be useful to look at the design of Beostat. It's superficially > similar to other systems out there, but we made decisions that are > much different than others -- ones that most consider wrong until they > understand their value. > > Some of them are: > It's not extensible > It reports values in a binary structure > It's UDP unicast to a single master machine > It has no liveness criteria > The receive side stores only current values > > The first one is the most uncommon. Beostat is not extensible. You can't > add in your own stat entries. You can't have it report stats from 64 > cores. It reports what it reports... that's it. > > Why is this important? We want to deploy cluster systems. Not build a > one-off cluster. We want the stats to be the same on every system we > deploy. We want every tool that uses the stats to be able to know that > they will be available. Once you allow and encourage a customizable > system, every deployment will be different. Tools won't work out of the > box, and there is a good chance that tools will require mutually > incompatible extensions. > > Deploying a fixed-content stat system also enforces discipline. We > carefully considered what we need to report, and how to report it. In > contrast look at Ganglia's stats. Why did they choose the set they did? > Pretty clearly because the underlying kernel reported those values. What > do they mean? The XML DTD doesn't tell you. You have to look at the > source code. What do you use them for? They don't know, they'll figure > it out later. > > People next question "but what if I have 8/16/64 cores? You only have 2 > [[ now 4 ]] CPU stat slots." The answer is similar to above -- what are > you going to do with all of that data? The answer is summarize it before > using it. We just summarize it on the reporting side. We report that > there are N CPUs, the overall load average, and then summarize the CPU > cores as groups (e.g. per socket). For network adapters we report e.g. > eth0, eth1, eth2 and "all the rest added together". > > Once we chose a fixed set of stats, we had a ability to make it a fixed > size report. It could be reported as binary values, with > any per-kernel-version variation done on the sending side. > > Having a small, limited-size report meant that it fit in a single network > packet. That makes the network load predictable and very scalable. > It gave us the opportunity to effectively use UDP to report, without > fragmenting into multiple frames. UDP means that we can switch to and > from multicast without changes, even changing in real time. > > A fixed-size frame makes the receiving side simple as well. We just > receive the incoming network frame into memory. No parsing, no > translation, no interpretation. We actually do a tiny bit more, such as > putting on a timestamp, but overall the receiving process does only > trivial work. This is important when the receiver is the master, which > could end up with the heaviest workload if the system isn't carefully > designed. We've support 1000+ machines for years, and are now designing > around 10K nodes. > > We actually do a tiny bit more when storing a stat packet -- we add a > timestamp. We can use this to figure out the time skew between the master > and computer node, verify the network reliability/load, and to decide if > the node is live. > > This isn't the only liveness test. It's not even the primary liveness > test. We document it as only a guideline. Developers should use the > underlying cluster management system to decide if a node has died. But if > there hasn't been a recent report, a scheduler should avoid using the node. > Classifying the world into Live and Dead is wrong. It's at least Live, > Dead and Schrodinger's Still-boxed Cat > > Finally, this is a State, Status and Statistics system. It's a > scoreboard, not a history book. We keep only two values, the last two > received. That gives us the current info, and the ability to calculate > rate. If any subsystem needs older values (very few do) it can pick a > logging, summarization and coalescing approach of its own. > > > We made many other innovative architectural decisions when designing the > system, such as publishing the stats as a read-only shared memory version. > But this are less interesting because no one disagrees with them ;-). > > > -- > Donald Becker [EMAIL PROTECTED] > Penguin Computing / Scyld Software > www.penguincomputing.com www.scyld.com > Annapolis MD and San Francisco CA > > > > _______________________________________________ > Beowulf mailing list, [EMAIL PROTECTED] > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > > ------------------------------------------------------------------------- > This SF.net email is sponsored by the 2008 JavaOne(SM) Conference > Don't miss this year's exciting event. There's still time to save $100. > Use priority code J8TL2D2. > http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone > _______________________________________________ > Ganglia-general mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/ganglia-general > ------------------------------------------------------------------------- This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone _______________________________________________ Ganglia-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/ganglia-general

