On Wed, Sep 03, 2008 at 09:39:57AM +0100, [EMAIL PROTECTED] wrote: > > Here's a little something we discovered by accident: if you move some of > the module definitions into files in /etc/ganglia/conf.d/something.conf, > but forget to remove them from /etc/ganglia/gmond.conf, then Ganglia > tries to initialise the same module twice.
a similar situation presume happens when you compiled your modules statically (--enable-static-build) and are using by mistake a configuration that instruct gmond to load the module. in both cases we sadly don't do much checking and assume configuration should be OK and try to load the code generating havoc as you reported. the solution to this bug will be (like apache does) to check the symbol table first for the object we are planning to insert and if it is already there just print a warning and skip loading it. > The result is that on the second initialisation, the module uses up all > the machines memory and then the process crashes. this might be a reflection of another bug, probably a memory leak we have which is just being amplified by the previous problem, as I would expect the memory allocated for the module to be returned when the module fails to load because the dynamic linker finds a conflicting object and aborts. > I observed this first with one of my own modules, and then reproduced it > with the cpu module. BTW, the problem is not in using different configuration files, but the fact that in the configuration you have 2 entries for the same module, which could be in the same file next to each other as well. > On Solaris, the process dies - on Linux, the whole box has gone down. Linux is misconfigured there, as no errant process should be able to take a system down, sadly though this is just a common case of linux misconfiguration (linux haters will say it is a design issue) where distributions just try to be conservative and don't adequately protect you from "fork bombs" or in this case "fast memory leaks", the dreaded OOM killer could help here or setting some sane limits (usually in /etc/security/limits.conf) as well as VM tunning > Should it be the module developer who detects this condition, or the > module loader code? module loader code but for now is the user responsibility to have a sane configuration. if you meant the "module developed code" should check about that, I'll think than other than cleanly removing all allocate memory at shutdown, there is not much that can be done at that point. also I am presuming you reproduced this problem with 3.1.0 as I did, or was this a report for 3.1.1 testing going bad?, in any case and even if it is not 3.1.1 specific getting a fix for this sooner than later might be a good idea, but I will defer Brad to make that decision, as code has yet to be produced to fix this and 3.1.1 is an important milestone at least as a starting point for people being able to deploy 3.1 in production with some confidence. Carlo ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ Ganglia-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/ganglia-developers
