Hi Timothy:

On Wed, Apr 2, 2008 at 9:31 PM, Witham, Timothy D
<[EMAIL PROTECTED]> wrote:

> Bernard said:
>
>  >I am running gmetad r1199 on a server which has one data_source which
>  is a server running gmetad 3.0.7 via port 8651.
>
>  >In the webfrontend, the summary "CPUs Total" shows nothing, however,
>  the "CPUs Total" for the data_source running gmetad 3.0.7 is correct.
>
>  I actually had done the 3.0 backport and attached it to bug #76.  Then I
>  later saw this behavior as well and had to revert to my original patch.
>  Basically, the /?filter=summary that the web frontend uses no longer
>  works.  I can't understand why.
>
>  While looking at this, I think I see an area for efficiency improvement.
>
>  Say I have a gmetad "child" that pulls from several large clusters.  It
>  must sum up all the hosts of all those clusters to generate its
>  __SummaryInfo__, both for each cluster and for the "child" grid itself.
>  Now I configure a gmetad "parent" which has "child" as one of its
>  sources.  It only needs the final sum which child has already
>  calculated.  However, it gets the full XML details of "child" and all
>  its individual clusters and even hosts.  It looks like it recalculates
>  the sum that child has already calculated.
>
>  Do I understand this right?  For large clusters/grids, this can be a
>  huge amount of XML that is getting passed, parsed and recalculated.  I
>  see that in 3.1 the raw XMl gets even larger.  What would be cool is if
>  the gmetad could talk to the interactive port of the child gmetad and
>  use the /?filter=summary.  Could this work?  I tried connecting a data
>  source to the interactive port but I get:
>
>  poll() timeout for [child] data source after 0 bytes read
>
>  likely because it is not giving the /?filter=summary command.

Since you manage a large grid and seem to doing similar things as I am
(eg. having another server aggregate information from multiple grids),
I would like to get a better understanding of an issue I am
encountering.

I wrote this email a while back:

http://www.mail-archive.com/[EMAIL PROTECTED]/msg00036.html

My server that is monitoring multiple grids is in HQ, but the grids
are in geographically separate locations which may be behind a fat but
long pipe (i.e. high latency).  We noticed that there are a lot of
ambient network traffic between the two sites even though we were not
knowingly doing any file transfers.  We ultimately identified that it
was gmetad sending a lot of XML data to the aggregator server in HQ.
>From my understanding, *all* METRICS data from all hosts were being
sent from the grids back to this aggregator box.  I thought this is
kind of a waste of bandwidth since if I wanted to drill down into the
grid, I would simply re-direct to the web frontend that is running on
that grid.

So what I ended up doing is set up another layer of gmetad between HQ
and the remote grid and that new layer would simply send summary
information back to HQ, which reduced the amount of traffic by a lot.

Since then, I have learnt that gmetad can do filtering by summary and
so I would assume reduced information is sent.  But since that grid
has a lot of data_sources, perhaps even "summary information" is not
summarized enough?  Perhaps we need a new filter "grid_summary" that
would summarize information for the entire grid and send it out via
XML?

Let me try to summarize...

Assume I have a grid (grid1) with 20 data_sources, and I have a box
which I want to aggregate information from grid1 called meta-grid.  On
meta-grid's gmetad.conf, I have:

data_source "grid1" grid1:8651

Are *all* metric information of each hosts in all 20 data_sources of
grid1 being sent back to meta-grid?  I would hope not, but then again
it talks to port 8651 which means all XML information should be sent.

If that's not that case, I would assume that only summary information
is sent.  But still I have 20 data_sources in that grid (I could
potentially have more, which would balloon the size of XML being
sent), are all that information necessary?  Or perhaps we can
summarize this further and only send that information back to
meta-grid (as I suggested above).

Hope this makes sense to someone -- and sorry for hi-jacking the thread ;-)

Thanks!

Bernard

-------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace
_______________________________________________
Ganglia-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Reply via email to