Re: [Ganglia-general] zillions of loged ganglia messages.

Jose Antonio Jimenez Baena Sat, 04 Aug 2007 03:22:07 -0700

No, they cannot talk to each other. They are hosts from different DMZ, so 
I'd need to open the firewall between them for this udp traffic.


Many thanks.

Regards. José Antonio.





richard grevis <[EMAIL PROTECTED]> 
04/08/2007 05:50

To
Jose Antonio Jimenez Baena/Spain/[EMAIL PROTECTED]
cc
[EMAIL PROTECTED], [email protected]
Subject
Re: [Ganglia-general] zillions of loged ganglia messages.







-
Quoting Jose Antonio Jimenez Baena <[EMAIL PROTECTED]>:

> .... one question ... choosing option 2 ( open firewall between all 
nodes 
> and gmetad ), I would do:
> 
> 1. choose a headnode within all nodes of the cluster.
> 2. open firewall :   all_nodes -----udp 8649-----> headnode.   (to send 
> him gmond traffic).
> 3. open firewall :   gmetad    -----tcp 8649-----> headnode   (to get 
data 
> from gmond in the headnode).
> 
> is that correct ? any other port ?


Yes. that sounds about right - the important thing is the "initiation" 
directions
which you have right. In fact the way ganglia works was quite good for our 
DMZ monitoring,
in that the UDP traffic stays in the DMZ (DMZ -> DMZ) , while the TCP
connection is initiated by gmetad, presumably in the green zone so its 
green -> DMZ.

Can't your DMZ hosts in the 2 lans talk directly to each other?

BTW, as you can set the UDP and TCP ports however you like, you may
be able to avoid firewall changes by (choke splutter) using port 80
or 443 or NTP or UDP DNS etc. It will confuse the staff though.

regards,
richard

> 
> 
> 
> Jose Antonio Jimenez Baena/Spain/IBM
> 02/08/2007 08:27
> 
> To
> richard grevis <[EMAIL PROTECTED]>
> cc
> [email protected], [EMAIL PROTECTED]
> Subject
> Re: [Ganglia-general] zillions of loged ganglia messages.
> 
> 
> 
> 
> 
> Richard, yes, you are rigth. I thought it was the rigth way, but 
obviously 
> my design is wrong. 
> 
> Having groups of nodes in different LANs behind a firewall, which are 
all 
> part of the same cluster, I suppose that you have 2 options:
> 
> 1 - As you said, move to a grid level.
> 2 - open firewall between all nodes and gmetad.
> 
> I think that I'd choose the second one ... if security team doesn't have 

> any concern ...
> 
> 
> Thanks everybody.
> 
> 
> 
> Regards. José Antonio.
> 
> 
> 
> 
> 
> 
> richard grevis <[EMAIL PROTECTED]> 
> 01/08/2007 17:39
> 
> To
> [EMAIL PROTECTED]
> cc
> Jose Antonio Jimenez Baena/Spain/[EMAIL PROTECTED], 
> [email protected]
> Subject
> Re: [Ganglia-general] zillions of loged ganglia messages.
> 
> 
> 
> 
> 
> 
> Richard
> -- 
> kind regards,
> Richard,
> 
> as per Jose's explanation and my earlier mail, I would bet money
> that it is 2 headnodes polled by gmetad and a shared cluster name
> but not the same hosts. Jose was trying to do this on purpose,
> but gmetad just doesn't behave like that.
> 
> regards,
> richard
> 
> 
> 
> Quoting Richard Mohr <[EMAIL PROTECTED]>:
> 
> > On Wed, 2007-08-01 at 08:55 +0200, Jose Antonio Jimenez Baena wrote:
> > 
> > > I continously get the ganglia messages, but only for 
'__SummaryInfo__'
> > > category: 
> > > 
> > > Aug  1 00:01:49 spdbinfpr1 user:info /opt/freeware/sbin/gmetad
> > > [745560]: RRD_update (/var/lib/ganglia/rrds/p570 
> > > _spdbsctms1/__SummaryInfo__/cpu_system.rrd): illegal attempt to 
update
> > > using time 1185919308 when last update 
> > > time is 1185919308 (minimum one second step) 
> > > 
> > > Could it be because in fact I have several headnodes reporting info
> > > for the same level ( __SummaryInfo__ )  ? If this is the case, how
> > > could be avoided ? 
> > 
> > I took a look at the source code.  All the writing to the rrd files is
> > done by the write_data_to_rrd() function.  The first two arguments are
> > the "sourcename" (which I think is just the cluster name) and the
> > "hostname".  It appears that function is called in three places when
> > gmetad processes the XML data:
> > 
> > 1) startElement_METRIC() - When a <metric> element is seen,
> > write_data_to_rrd(xmldata->sourcename, xmldata->hostname, ...) is 
called
> > to write the metric info to the host-specific rrd file.
> > 
> > 2) finish_processing_source() - I think this is called when the
> > </cluster> tag is seen.  It invokes write_data_to_rrd(xmldata-
> > >sourcename, NULL, ...).  The NULL hostname indicates that the metric
> > rrd file in the cluster-specific __SummaryInfo__ dir should be written
> > to.
> > 
> > 3) write_root_summary() - This invokes write_data_to_rrd(NULL,
> > NULL, ...).  Using NULL for the sourcename and the hostname causes the
> > metric rrd file in the global __SummaryInfo__ dir to be updated.
> > 
> > I had two thoughts as to what might cause your problem, but I wasn't
> > able to test them (so they might be long shots):
> > 
> > 1) It sounds like you have two "sources" with the same cluster name.
> > Maybe gmetad calls finish_processing_source() when it see the 
</cluster>
> > tag for the first source.  It then updates the 
{cluster}/__SummaryInfo/
> > dir.  When gmetad encounters the </cluster> tag for the second source 
of
> > the same name, it tries again to update files in
> > {cluster}/__SummaryInfo/ using the same timestamp.  This could then
> > cause the error.
> > 
> > 2) The hostname for one of the nodes in the xml data is NULL.  I'm not
> > sure how this could happen, but if it did, then when 
startElement_METRIC
> > () tries to update the host's metric info, it actually calls
> > write_data_to_rrd(xmldata->sourcename, NULL, ...).  This would update
> > the __SummaryInfo__ dir instead.  Later, when the </cluster> tag is
> > seem, gmetad calls finish_processing_source() to update 
__SummaryInfo__
> > with the same timestamp.  This might cause the error.
> > 
> > Like I said, these are shots in the dark.  But if either sounds
> > plausible to you, it probably wouldn't be too hard for you to add a
> > couple of debug statements to the ganglia source to see if either of
> > them is the culprit.
> > 
> > -- 
> > Rick Mohr
> > Systems Developer
> > Ohio Supercomputer Center
> > 
> > 
> > 
> 
-------------------------------------------------------------------------
> > This SF.net email is sponsored by: Splunk Inc.
> > Still grepping through log files to find problems?  Stop.
> > Now Search log events and configuration files using AJAX and a 
browser.
> > Download your FREE copy of Splunk now >>  http://get.splunk.com/
> > _______________________________________________
> > Ganglia-general mailing list
> > [email protected]
> > https://lists.sourceforge.net/lists/listinfo/ganglia-general
> > 
> 
> 
>

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/

_______________________________________________
Ganglia-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/ganglia-general

Re: [Ganglia-general] zillions of loged ganglia messages.

Reply via email to