Hi Stephen: When replying, please hit "Reply-All" or otherwise include the ganglia-general mailing-list in the To: or Cc: field. This ensures that this discussion gets archived in hopes of helping future users with the same problem -- thanks!
See my responses inline: On Mon, Aug 24, 2009 at 3:47 PM, Stephen Spencer<[email protected]> wrote: > So. > > New week, fresh start. > > I downloaded the 3.1.1 src RPM and built it on Fedora 9, Fedora 11 and RHEL5 > (64-bit) architectures. Actually the latest release is 3.1.2, you should use that instead. You can easily build RPMs with the tarball: rpmbuild -tb --target x86_64,noarch <tarball> You need to build for both archs because ganglia-web is noarch whereas everything else is arch-dependent. > Set up my compute-cluster nodes (the 'grAPHics' cluster) and > <http://frame.cs.washington.edu/ganglia/> as its web frontend. > It has 'fusion.cs.washington.edu' as a trusted_host. > > Set up a different cluster's nodes (the 'Renderfarm' cluster) and > <http://production.cs.washington.edu/ganglia/> as its web frontend. > It, too, has 'fusion.cs.washington.edu' as a trusted_host. > > So far, so good. (I had been working with 3.0.7 before and that bugged me, > being so out-of-date. The move to 3.1.1 was a good one, I think.) > > All 'gmetad' conf files have a "gridname" defined. > > I then set up 'fusion.cs.washington.edu' with the 'gmetad' and web frontend. > > It has one data_source line at this time: > > data_source "grAPHICS" frame.cs.washington.edu:8651 > > Looking at the 'fusion' web interface, the grid name is there, and the > summary (load and memory) graphs are there, but shouldn't there be one > source instead of "(0 sources)"? > > If I add a second data_source line: > > data_source "Renderfarm" production.cs.washington.edu:8651 > > and restart the 'gmetad' process on 'fusion' I immediately start to see > error messages about conflicting updates of the RRD files: > Aug 24 14:57:13 fusion /usr/sbin/gmetad[12584]: RRD_update > (/var/lib/ganglia/rrds/UW.CSE.GRAIL/__SummaryInfo__/multicpu_system0.rrd): > illegal attempt to update using time 1251151033 when last update time is > 1251151033 (minimum one second step) This particular error message is usually caused by computers' time being out of sync. Can you ensure that all the servers running gmond and gmetad have time synced properly with a time server such as pool.ntp.org using ntpd? > I can tell I'm *closer* to a working solution, but not there yet: > > - Is that not the way to add more than one cluster to the top-level-grid's > configuration file? It is. > - Shouldn't I be able to select from the clusters, and 'drill down' to that > cluster's web frontend? Yes, see the following example: http://monitor.millennium.berkeley.edu/?m=&r=hour&s=descending&hc=4 > How is that set up? >From what I've been able to see so far, you are doing it correctly, so something else is preventing this from working. I asked this before, but do you have SELinux enabled on your systems? I think what you should try next, is to run gmetad (on fusion) in debug mode (remember to turn off gmetad before running that manually) and see if it can provide the reason why it is failing to contact frame. You might also consider using a pre-packaged/stable version of RRDTool -- the one you are running looks like some beta version. Cheers, Bernard ------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ Ganglia-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/ganglia-general

