[Ganglia-developers] The Gap

Steven Wagner Mon, 09 Sep 2002 14:43:10 -0700

Not sure if this has anything to do with anything, but I added mysingle-node Alpha cluster as a data source and that's currently my onlydata source that has no gaps in data collection. My 2.5.0 fileservercluster has several of the nodes defined as collection targets, whereas the2.4.1 cluster has a single "silent partner" node that trusts only thegmetad box.

The graphs, in decreasing order of stability, are: Alpha, 2.5.0(multi-source) and 2.4.1 (single-source).

Since nobody has complained of this sort of thing on Linux, I can onlyassume that it does not happen on Linux gmetad. Funky Solaris socketlibrary strikes again? That's the only thing I can think of, because Iknow it isn't a network issue. The largest "payload" a source transmits isthe 2.4.1 machine, transmitting a 500k XML feed in under a second. So it'snot Matt's network latency threshold. Perhaps the "nap" code is pushing itover the top in this case?

Has anyone tried this on a 200+ node setup on Linux and experienced thesame "gappy" behavior?

I came back from lunch to find that gmetad had segfaulted, but hadn'tdumped core (in parsing a metric, it expected one key and got zero). I'llsee if it does it again (all sources appeared to be up...).

Also want to see if 2.4.1 will crash again. That more than anything elsemakes me think there's something rotten in the state of the Solaris socketlib...


[another rambling, conclusionless email...]

[Ganglia-developers] The Gap

Reply via email to