.... one question ... choosing option 2 ( open firewall between all nodes
and gmetad ), I would do:
1. choose a headnode within all nodes of the cluster.
2. open firewall : all_nodes -----udp 8649-----> headnode. (to send
him gmond traffic).
3. open firewall : gmetad -----tcp 8649-----> headnode (to get data
from gmond in the headnode).
is that correct ? any other port ?
Thanks. José Antonio.
Jose Antonio Jimenez Baena/Spain/IBM
02/08/2007 08:27
To
richard grevis <[EMAIL PROTECTED]>
cc
[email protected], [EMAIL PROTECTED]
Subject
Re: [Ganglia-general] zillions of loged ganglia messages.
Richard, yes, you are rigth. I thought it was the rigth way, but obviously
my design is wrong.
Having groups of nodes in different LANs behind a firewall, which are all
part of the same cluster, I suppose that you have 2 options:
1 - As you said, move to a grid level.
2 - open firewall between all nodes and gmetad.
I think that I'd choose the second one ... if security team doesn't have
any concern ...
Thanks everybody.
Regards. José Antonio.
richard grevis <[EMAIL PROTECTED]>
01/08/2007 17:39
To
[EMAIL PROTECTED]
cc
Jose Antonio Jimenez Baena/Spain/[EMAIL PROTECTED],
[email protected]
Subject
Re: [Ganglia-general] zillions of loged ganglia messages.
Richard
--
kind regards,
Richard,
as per Jose's explanation and my earlier mail, I would bet money
that it is 2 headnodes polled by gmetad and a shared cluster name
but not the same hosts. Jose was trying to do this on purpose,
but gmetad just doesn't behave like that.
regards,
richard
Quoting Richard Mohr <[EMAIL PROTECTED]>:
> On Wed, 2007-08-01 at 08:55 +0200, Jose Antonio Jimenez Baena wrote:
>
> > I continously get the ganglia messages, but only for '__SummaryInfo__'
> > category:
> >
> > Aug 1 00:01:49 spdbinfpr1 user:info /opt/freeware/sbin/gmetad
> > [745560]: RRD_update (/var/lib/ganglia/rrds/p570
> > _spdbsctms1/__SummaryInfo__/cpu_system.rrd): illegal attempt to update
> > using time 1185919308 when last update
> > time is 1185919308 (minimum one second step)
> >
> > Could it be because in fact I have several headnodes reporting info
> > for the same level ( __SummaryInfo__ ) ? If this is the case, how
> > could be avoided ?
>
> I took a look at the source code. All the writing to the rrd files is
> done by the write_data_to_rrd() function. The first two arguments are
> the "sourcename" (which I think is just the cluster name) and the
> "hostname". It appears that function is called in three places when
> gmetad processes the XML data:
>
> 1) startElement_METRIC() - When a <metric> element is seen,
> write_data_to_rrd(xmldata->sourcename, xmldata->hostname, ...) is called
> to write the metric info to the host-specific rrd file.
>
> 2) finish_processing_source() - I think this is called when the
> </cluster> tag is seen. It invokes write_data_to_rrd(xmldata-
> >sourcename, NULL, ...). The NULL hostname indicates that the metric
> rrd file in the cluster-specific __SummaryInfo__ dir should be written
> to.
>
> 3) write_root_summary() - This invokes write_data_to_rrd(NULL,
> NULL, ...). Using NULL for the sourcename and the hostname causes the
> metric rrd file in the global __SummaryInfo__ dir to be updated.
>
> I had two thoughts as to what might cause your problem, but I wasn't
> able to test them (so they might be long shots):
>
> 1) It sounds like you have two "sources" with the same cluster name.
> Maybe gmetad calls finish_processing_source() when it see the </cluster>
> tag for the first source. It then updates the {cluster}/__SummaryInfo/
> dir. When gmetad encounters the </cluster> tag for the second source of
> the same name, it tries again to update files in
> {cluster}/__SummaryInfo/ using the same timestamp. This could then
> cause the error.
>
> 2) The hostname for one of the nodes in the xml data is NULL. I'm not
> sure how this could happen, but if it did, then when startElement_METRIC
> () tries to update the host's metric info, it actually calls
> write_data_to_rrd(xmldata->sourcename, NULL, ...). This would update
> the __SummaryInfo__ dir instead. Later, when the </cluster> tag is
> seem, gmetad calls finish_processing_source() to update __SummaryInfo__
> with the same timestamp. This might cause the error.
>
> Like I said, these are shots in the dark. But if either sounds
> plausible to you, it probably wouldn't be too hard for you to add a
> couple of debug statements to the ganglia source to see if either of
> them is the culprit.
>
> --
> Rick Mohr
> Systems Developer
> Ohio Supercomputer Center
>
>
>
-------------------------------------------------------------------------
> This SF.net email is sponsored by: Splunk Inc.
> Still grepping through log files to find problems? Stop.
> Now Search log events and configuration files using AJAX and a browser.
> Download your FREE copy of Splunk now >> http://get.splunk.com/
> _______________________________________________
> Ganglia-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/ganglia-general
>
-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
Ganglia-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/ganglia-general