Hi Timothy: On Wed, Apr 2, 2008 at 9:31 PM, Witham, Timothy D <[EMAIL PROTECTED]> wrote:
> Bernard said: > > >I am running gmetad r1199 on a server which has one data_source which > is a server running gmetad 3.0.7 via port 8651. > > >In the webfrontend, the summary "CPUs Total" shows nothing, however, > the "CPUs Total" for the data_source running gmetad 3.0.7 is correct. > > I actually had done the 3.0 backport and attached it to bug #76. Then I > later saw this behavior as well and had to revert to my original patch. > Basically, the /?filter=summary that the web frontend uses no longer > works. I can't understand why. > > While looking at this, I think I see an area for efficiency improvement. > > Say I have a gmetad "child" that pulls from several large clusters. It > must sum up all the hosts of all those clusters to generate its > __SummaryInfo__, both for each cluster and for the "child" grid itself. > Now I configure a gmetad "parent" which has "child" as one of its > sources. It only needs the final sum which child has already > calculated. However, it gets the full XML details of "child" and all > its individual clusters and even hosts. It looks like it recalculates > the sum that child has already calculated. > > Do I understand this right? For large clusters/grids, this can be a > huge amount of XML that is getting passed, parsed and recalculated. I > see that in 3.1 the raw XMl gets even larger. What would be cool is if > the gmetad could talk to the interactive port of the child gmetad and > use the /?filter=summary. Could this work? I tried connecting a data > source to the interactive port but I get: > > poll() timeout for [child] data source after 0 bytes read > > likely because it is not giving the /?filter=summary command. Since you manage a large grid and seem to doing similar things as I am (eg. having another server aggregate information from multiple grids), I would like to get a better understanding of an issue I am encountering. I wrote this email a while back: http://www.mail-archive.com/[EMAIL PROTECTED]/msg00036.html My server that is monitoring multiple grids is in HQ, but the grids are in geographically separate locations which may be behind a fat but long pipe (i.e. high latency). We noticed that there are a lot of ambient network traffic between the two sites even though we were not knowingly doing any file transfers. We ultimately identified that it was gmetad sending a lot of XML data to the aggregator server in HQ. >From my understanding, *all* METRICS data from all hosts were being sent from the grids back to this aggregator box. I thought this is kind of a waste of bandwidth since if I wanted to drill down into the grid, I would simply re-direct to the web frontend that is running on that grid. So what I ended up doing is set up another layer of gmetad between HQ and the remote grid and that new layer would simply send summary information back to HQ, which reduced the amount of traffic by a lot. Since then, I have learnt that gmetad can do filtering by summary and so I would assume reduced information is sent. But since that grid has a lot of data_sources, perhaps even "summary information" is not summarized enough? Perhaps we need a new filter "grid_summary" that would summarize information for the entire grid and send it out via XML? Let me try to summarize... Assume I have a grid (grid1) with 20 data_sources, and I have a box which I want to aggregate information from grid1 called meta-grid. On meta-grid's gmetad.conf, I have: data_source "grid1" grid1:8651 Are *all* metric information of each hosts in all 20 data_sources of grid1 being sent back to meta-grid? I would hope not, but then again it talks to port 8651 which means all XML information should be sent. If that's not that case, I would assume that only summary information is sent. But still I have 20 data_sources in that grid (I could potentially have more, which would balloon the size of XML being sent), are all that information necessary? Or perhaps we can summarize this further and only send that information back to meta-grid (as I suggested above). Hope this makes sense to someone -- and sorry for hi-jacking the thread ;-) Thanks! Bernard ------------------------------------------------------------------------- Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace _______________________________________________ Ganglia-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/ganglia-developers
