Hi Brad, I face some mouths ago a quit same problem. To work arround it, I use a gmetad_node2 in version 3.0.1.
Hereafter the stack of gmetad at failure time ( 3.0.4 ) in my environment : Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 131081 (LWP 12739)] *__GI___pthread_mutex_unlock (mutex=0x0) at mutex.c:178 178 mutex.c: No such file or directory. in mutex.c (gdb) where #0 *__GI___pthread_mutex_unlock (mutex=0x0) at mutex.c:178 #1 0x0804e1e0 in endElement_CLUSTER () #2 0x0804e2ee in end () #3 0x0805a26e in doContent () #4 0x08059319 in contentProcessor () #5 0x0805c6ba in doProlog () #6 0x0805c063 in prologProcessor () #7 0x0805bfe9 in prologInitProcessor () #8 0x08058d4d in XML_ParseBuffer () #9 0x08058cb5 in XML_Parse () #10 0x0804e3d0 in process_xml () #11 0x0804b341 in data_thread () #12 0x40085c80 in pthread_start_thread (arg=0x4121dbe0) at manager.c:301 #13 0x40085d82 in pthread_start_thread_event (arg=0x4121dbe0) at manager.c:324 #14 0x401b9f87 in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:100 (gdb) (gdb) print *xmldata $12 = {rval = 134700213, old = 2, sourcename = 0x8075c7f "", hostname = 0x0, ds = 0x8075cbd, grid_depth = 6, host_alive = 134700224, source = {id = 29, report_start = 0x8075cc4 <_IO_stdin_used+8768>, report_end = 0x4, authority = 0x8075cc9, authority_ptr = 20, metric_summary = 0x8075c7f, sum_finished = 0x0, ds = 0x8075c7f, hosts_up = 0, hosts_down = 134700236, localtime = 21, owner = 23679, latlong = 2055, url = 0, stringslen = 0, source = &xmldata->source; summary = xmldata->source.metric_summary; /* Release the partial sum mutex */ pthread_mutex_unlock(source->sum_finished); /*err_msg("%s releasing lock", xmldata->sourcename);*/ Best Regards. Christian. ----- Original Message ----- From: "Brad Anderson" <[EMAIL PROTECTED]> To: <ganglia-general@lists.sourceforge.net> Sent: Thursday, March 06, 2008 8:03 PM Subject: [Ganglia-general] dual gmetad setup > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > All, > > I am having issues getting a dual gmetad env up and running. Here > is the problem. I have one gmetad node (gmetad_node1) checking a > single cluster of 1 machine. This node works fine, rrds are being > created and when I place a UI ontop of it all is well. The trouble I > am having is with my second gmetad node (gmetad_node2). I want this > node to pull all its data from gmetad_node1 and store a copy of all > rrds on its file system as well. I have turned off the "scalabe" > option in gmetad.conf , and it starts to collect the first round of > data but dies shortly after writing rrds. I have included a log of > gmetad_node2 start up with debug at 10. > > any help on this issue would be appreciated. > > Regards, > Brad Anderson > > > gmetad_node1: > - CentOS 4.4 > - ganglia-gmetad-3.0.6-1 > - ganglia-web-3.0.6-1 > - monitoring a single cluster of 1 machine > - writes rrds localy to disk > > > gmetad_node2: > - CentOS 4.4 > - ganglia-gmetad-3.0.6-1 > - ganglia-web-3.0.6-1 > - scalable off > - single data_source of gmetad_node1 > > > gmetad_node2 startup debug log: > /etc/init.d/gmetad restart > Shutting down GANGLIA gmetad: [FAILED] > Starting GANGLIA gmetad: Going to run as user nobody > Sources are ... > Source: [grid1, step 30] has 1 sources > 10.0.0.1 > xml listening on port 8651 > interactive xml listening on port 8652 > Data thread -1271247952 is monitoring [grid1] data source > 10.0.0.1 > cleanup thread has been started > [grid1] is a 2.5 or later data stream > hash_create size = 1024 > hash->size is 1031 > hash_create size = 50 > hash->size is 53 > hash_create size = 50 > hash->size is 53 > Updating host host1.domain.com, metric disk_free > Updating host host1.domain.com, metric bytes_out > Updating host host1.domain.com, metric proc_total > Updating host host1.domain.com, metric pkts_in > Updating host host1.domain.com, metric cpu_nice > Updating host host1.domain.com, metric cpu_speed > Updating host host1.domain.com, metric boottime > Updating host host1.domain.com, metric qmail_msgs_to_be_preprocessed > Updating host host1.domain.com, metric cpu_wio > Updating host host1.domain.com, metric qmail_msgs_in_queue > Updating host host1.domain.com, metric load_one > Updating host host1.domain.com, metric disk_total > Updating host host1.domain.com, metric cpu_idle > Updating host host1.domain.com, metric cpu_user > Updating host host1.domain.com, metric swap_free > Updating host host1.domain.com, metric mem_cached > Updating host host1.domain.com, metric pkts_out > Updating host host1.domain.com, metric load_five > Updating host host1.domain.com, metric cpu_num > Updating host host1.domain.com, metric load_fifteen > Updating host host1.domain.com, metric mem_free > Updating host host1.domain.com, metric cpu_system > Updating host host1.domain.com, metric proc_run > Updating host host1.domain.com, metric mem_total > Updating host host1.domain.com, metric cpu_aidle > Updating host host1.domain.com, metric bytes_in > Updating host host1.domain.com, metric mem_buffers > Updating host host1.domain.com, metric mem_shared > Updating host host1.domain.com, metric swap_total > Updating host host1.domain.com, metric part_max_used > Writing Summary data for source Servers, metric disk_free > Writing Summary data for source Servers, metric bytes_out > Writing Summary data for source Servers, metric proc_total > Writing Summary data for source Servers, metric cpu_nice > Writing Summary data for source Servers, metric pkts_in > Writing Summary data for source Servers, metric cpu_speed > Writing Summary data for source Servers, metric boottime > Writing Summary data for source Servers, metric > qmail_msgs_to_be_preprocessed > Writing Summary data for source Servers, metric cpu_wio > Writing Summary data for source Servers, metric qmail_msgs_in_queue > Writing Summary data for source Servers, metric load_one > Writing Summary data for source Servers, metric disk_total > Writing Summary data for source Servers, metric cpu_user > Writing Summary data for source Servers, metric cpu_idle > Writing Summary data for source Servers, metric swap_free > Writing Summary data for source Servers, metric pkts_out > Writing Summary data for source Servers, metric mem_cached > Writing Summary data for source Servers, metric load_five > Writing Summary data for source Servers, metric cpu_num > Writing Summary data for source Servers, metric load_fifteen > Writing Summary data for source Servers, metric mem_free > Writing Summary data for source Servers, metric cpu_system > Writing Summary data for source Servers, metric proc_run > Writing Summary data for source Servers, metric mem_total > Writing Summary data for source Servers, metric cpu_aidle > Writing Summary data for source Servers, metric bytes_in > Writing Summary data for source Servers, metric mem_buffers > Writing Summary data for source Servers, metric mem_shared > Writing Summary data for source Servers, metric swap_total > Writing Summary data for source Servers, metric part_max_used > [FAILED] > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.6 (GNU/Linux) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org > > iD8DBQFH0D/7qOVHpERMGj0RAgFdAJ9Opr4bGThQwqxza7EdUtmW0cShXgCbBDNS > X9jO6tMkwKjcvnLlsNJy1J4= > =ed0P > -----END PGP SIGNATURE----- > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Ganglia-general mailing list > Ganglia-general@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/ganglia-general > ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general