Re: [Ganglia-general] zillions of loged ganglia messages.

Jose Antonio Jimenez Baena Thu, 02 Aug 2007 00:38:31 -0700

.... one question ... choosing option 2 ( open firewall between all nodes 
and gmetad ), I would do:


1. choose a headnode within all nodes of the cluster.
2. open firewall :   all_nodes -----udp 8649-----> headnode.   (to send 
him gmond traffic).
3. open firewall :   gmetad    -----tcp 8649-----> headnode   (to get data 
from gmond in the headnode).

is that correct ? any other port ?

Thanks. José Antonio.







Jose Antonio Jimenez Baena/Spain/IBM
02/08/2007 08:27

To
richard grevis <[EMAIL PROTECTED]>
cc
[email protected], [EMAIL PROTECTED]
Subject
Re: [Ganglia-general] zillions of loged ganglia messages.





Richard, yes, you are rigth. I thought it was the rigth way, but obviously 
my design is wrong. 

Having groups of nodes in different LANs behind a firewall, which are all 
part of the same cluster, I suppose that you have 2 options:

1 - As you said, move to a grid level.
2 - open firewall between all nodes and gmetad.

I think that I'd choose the second one ... if security team doesn't have 
any concern ...


Thanks everybody.



Regards. José Antonio.






richard grevis <[EMAIL PROTECTED]> 
01/08/2007 17:39

To
[EMAIL PROTECTED]
cc
Jose Antonio Jimenez Baena/Spain/[EMAIL PROTECTED], 
[email protected]
Subject
Re: [Ganglia-general] zillions of loged ganglia messages.






Richard
-- 
kind regards,
Richard,

as per Jose's explanation and my earlier mail, I would bet money
that it is 2 headnodes polled by gmetad and a shared cluster name
but not the same hosts. Jose was trying to do this on purpose,
but gmetad just doesn't behave like that.

regards,
richard



Quoting Richard Mohr <[EMAIL PROTECTED]>:

> On Wed, 2007-08-01 at 08:55 +0200, Jose Antonio Jimenez Baena wrote:
> 
> > I continously get the ganglia messages, but only for '__SummaryInfo__'
> > category: 
> > 
> > Aug  1 00:01:49 spdbinfpr1 user:info /opt/freeware/sbin/gmetad
> > [745560]: RRD_update (/var/lib/ganglia/rrds/p570 
> > _spdbsctms1/__SummaryInfo__/cpu_system.rrd): illegal attempt to update
> > using time 1185919308 when last update 
> > time is 1185919308 (minimum one second step) 
> > 
> > Could it be because in fact I have several headnodes reporting info
> > for the same level ( __SummaryInfo__ )  ? If this is the case, how
> > could be avoided ? 
> 
> I took a look at the source code.  All the writing to the rrd files is
> done by the write_data_to_rrd() function.  The first two arguments are
> the "sourcename" (which I think is just the cluster name) and the
> "hostname".  It appears that function is called in three places when
> gmetad processes the XML data:
> 
> 1) startElement_METRIC() - When a <metric> element is seen,
> write_data_to_rrd(xmldata->sourcename, xmldata->hostname, ...) is called
> to write the metric info to the host-specific rrd file.
> 
> 2) finish_processing_source() - I think this is called when the
> </cluster> tag is seen.  It invokes write_data_to_rrd(xmldata-
> >sourcename, NULL, ...).  The NULL hostname indicates that the metric
> rrd file in the cluster-specific __SummaryInfo__ dir should be written
> to.
> 
> 3) write_root_summary() - This invokes write_data_to_rrd(NULL,
> NULL, ...).  Using NULL for the sourcename and the hostname causes the
> metric rrd file in the global __SummaryInfo__ dir to be updated.
> 
> I had two thoughts as to what might cause your problem, but I wasn't
> able to test them (so they might be long shots):
> 
> 1) It sounds like you have two "sources" with the same cluster name.
> Maybe gmetad calls finish_processing_source() when it see the </cluster>
> tag for the first source.  It then updates the {cluster}/__SummaryInfo/
> dir.  When gmetad encounters the </cluster> tag for the second source of
> the same name, it tries again to update files in
> {cluster}/__SummaryInfo/ using the same timestamp.  This could then
> cause the error.
> 
> 2) The hostname for one of the nodes in the xml data is NULL.  I'm not
> sure how this could happen, but if it did, then when startElement_METRIC
> () tries to update the host's metric info, it actually calls
> write_data_to_rrd(xmldata->sourcename, NULL, ...).  This would update
> the __SummaryInfo__ dir instead.  Later, when the </cluster> tag is
> seem, gmetad calls finish_processing_source() to update __SummaryInfo__
> with the same timestamp.  This might cause the error.
> 
> Like I said, these are shots in the dark.  But if either sounds
> plausible to you, it probably wouldn't be too hard for you to add a
> couple of debug statements to the ganglia source to see if either of
> them is the culprit.
> 
> -- 
> Rick Mohr
> Systems Developer
> Ohio Supercomputer Center
> 
> 
> 
-------------------------------------------------------------------------
> This SF.net email is sponsored by: Splunk Inc.
> Still grepping through log files to find problems?  Stop.
> Now Search log events and configuration files using AJAX and a browser.
> Download your FREE copy of Splunk now >>  http://get.splunk.com/
> _______________________________________________
> Ganglia-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/ganglia-general
>

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/

_______________________________________________
Ganglia-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/ganglia-general

Re: [Ganglia-general] zillions of loged ganglia messages.

Reply via email to