Hi Brad,
I face some mouths ago a quit same problem.
To work arround it, I use a gmetad_node2 in version 3.0.1.

Hereafter the stack of gmetad at failure time ( 3.0.4 )  in my environment :

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 131081 (LWP 12739)]
*__GI___pthread_mutex_unlock (mutex=0x0) at mutex.c:178
178     mutex.c: No such file or directory.
        in mutex.c
(gdb) where
#0  *__GI___pthread_mutex_unlock (mutex=0x0) at mutex.c:178
#1  0x0804e1e0 in endElement_CLUSTER ()
#2  0x0804e2ee in end ()
#3  0x0805a26e in doContent ()
#4  0x08059319 in contentProcessor ()
#5  0x0805c6ba in doProlog ()
#6  0x0805c063 in prologProcessor ()
#7  0x0805bfe9 in prologInitProcessor ()
#8  0x08058d4d in XML_ParseBuffer ()
#9  0x08058cb5 in XML_Parse ()
#10 0x0804e3d0 in process_xml ()
#11 0x0804b341 in data_thread ()
#12 0x40085c80 in pthread_start_thread (arg=0x4121dbe0) at manager.c:301
#13 0x40085d82 in pthread_start_thread_event (arg=0x4121dbe0) at 
manager.c:324
#14 0x401b9f87 in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:100
(gdb)
(gdb) print *xmldata
$12 = {rval = 134700213, old = 2, sourcename = 0x8075c7f "", hostname = 0x0, 
ds = 0x8075cbd,
  grid_depth = 6, host_alive = 134700224, source = {id = 29,
    report_start = 0x8075cc4 <_IO_stdin_used+8768>, report_end = 0x4, 
authority = 0x8075cc9,
    authority_ptr = 20, metric_summary = 0x8075c7f, sum_finished = 0x0, ds = 
0x8075c7f,
    hosts_up = 0, hosts_down = 134700236, localtime = 21, owner = 23679, 
latlong = 2055, url = 0,
    stringslen = 0,


         source = &xmldata->source;
         summary = xmldata->source.metric_summary;

         /* Release the partial sum mutex */
         pthread_mutex_unlock(source->sum_finished);
         /*err_msg("%s releasing lock", xmldata->sourcename);*/

Best Regards.
Christian.

----- Original Message ----- 
From: "Brad Anderson" <[EMAIL PROTECTED]>
To: <ganglia-general@lists.sourceforge.net>
Sent: Thursday, March 06, 2008 8:03 PM
Subject: [Ganglia-general] dual gmetad setup


> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> All,
>
>  I am having issues getting a dual gmetad env up and running.  Here
> is the problem.  I have one gmetad node (gmetad_node1) checking a
> single cluster of 1 machine.  This node works fine, rrds are being
> created and when I place a UI ontop of it all is well.  The trouble I
> am having is with my second gmetad node (gmetad_node2).  I want this
> node to pull all its data from gmetad_node1 and store a copy of all
> rrds on its file system as well.  I have turned off the "scalabe"
> option in gmetad.conf , and it starts to collect the first round of
> data but dies shortly after writing rrds.  I have included a log of
> gmetad_node2 start up with debug at 10.
>
> any help on this issue would be appreciated.
>
> Regards,
> Brad Anderson
>
>
> gmetad_node1:
>  - CentOS 4.4
>  - ganglia-gmetad-3.0.6-1
>  - ganglia-web-3.0.6-1
>  - monitoring a single cluster of 1 machine
>  - writes rrds localy to disk
>
>
> gmetad_node2:
>  - CentOS 4.4
>  - ganglia-gmetad-3.0.6-1
>  - ganglia-web-3.0.6-1
>  - scalable off
>  - single data_source of gmetad_node1
>
>
> gmetad_node2 startup debug log:
> /etc/init.d/gmetad restart
> Shutting down GANGLIA gmetad:                              [FAILED]
> Starting GANGLIA gmetad: Going to run as user nobody
> Sources are ...
> Source: [grid1, step 30] has 1 sources
>        10.0.0.1
> xml listening on port 8651
> interactive xml listening on port 8652
> Data thread -1271247952 is monitoring [grid1] data source
>        10.0.0.1
> cleanup thread has been started
> [grid1] is a 2.5 or later data stream
> hash_create size = 1024
> hash->size is 1031
> hash_create size = 50
> hash->size is 53
> hash_create size = 50
> hash->size is 53
> Updating host host1.domain.com, metric disk_free
> Updating host host1.domain.com, metric bytes_out
> Updating host host1.domain.com, metric proc_total
> Updating host host1.domain.com, metric pkts_in
> Updating host host1.domain.com, metric cpu_nice
> Updating host host1.domain.com, metric cpu_speed
> Updating host host1.domain.com, metric boottime
> Updating host host1.domain.com, metric qmail_msgs_to_be_preprocessed
> Updating host host1.domain.com, metric cpu_wio
> Updating host host1.domain.com, metric qmail_msgs_in_queue
> Updating host host1.domain.com, metric load_one
> Updating host host1.domain.com, metric disk_total
> Updating host host1.domain.com, metric cpu_idle
> Updating host host1.domain.com, metric cpu_user
> Updating host host1.domain.com, metric swap_free
> Updating host host1.domain.com, metric mem_cached
> Updating host host1.domain.com, metric pkts_out
> Updating host host1.domain.com, metric load_five
> Updating host host1.domain.com, metric cpu_num
> Updating host host1.domain.com, metric load_fifteen
> Updating host host1.domain.com, metric mem_free
> Updating host host1.domain.com, metric cpu_system
> Updating host host1.domain.com, metric proc_run
> Updating host host1.domain.com, metric mem_total
> Updating host host1.domain.com, metric cpu_aidle
> Updating host host1.domain.com, metric bytes_in
> Updating host host1.domain.com, metric mem_buffers
> Updating host host1.domain.com, metric mem_shared
> Updating host host1.domain.com, metric swap_total
> Updating host host1.domain.com, metric part_max_used
> Writing Summary data for source Servers, metric disk_free
> Writing Summary data for source Servers, metric bytes_out
> Writing Summary data for source Servers, metric proc_total
> Writing Summary data for source Servers, metric cpu_nice
> Writing Summary data for source Servers, metric pkts_in
> Writing Summary data for source Servers, metric cpu_speed
> Writing Summary data for source Servers, metric boottime
> Writing Summary data for source Servers, metric
> qmail_msgs_to_be_preprocessed
> Writing Summary data for source Servers, metric cpu_wio
> Writing Summary data for source Servers, metric qmail_msgs_in_queue
> Writing Summary data for source Servers, metric load_one
> Writing Summary data for source Servers, metric disk_total
> Writing Summary data for source Servers, metric cpu_user
> Writing Summary data for source Servers, metric cpu_idle
> Writing Summary data for source Servers, metric swap_free
> Writing Summary data for source Servers, metric pkts_out
> Writing Summary data for source Servers, metric mem_cached
> Writing Summary data for source Servers, metric load_five
> Writing Summary data for source Servers, metric cpu_num
> Writing Summary data for source Servers, metric load_fifteen
> Writing Summary data for source Servers, metric mem_free
> Writing Summary data for source Servers, metric cpu_system
> Writing Summary data for source Servers, metric proc_run
> Writing Summary data for source Servers, metric mem_total
> Writing Summary data for source Servers, metric cpu_aidle
> Writing Summary data for source Servers, metric bytes_in
> Writing Summary data for source Servers, metric mem_buffers
> Writing Summary data for source Servers, metric mem_shared
> Writing Summary data for source Servers, metric swap_total
> Writing Summary data for source Servers, metric part_max_used
>                                                           [FAILED]
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.6 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>
> iD8DBQFH0D/7qOVHpERMGj0RAgFdAJ9Opr4bGThQwqxza7EdUtmW0cShXgCbBDNS
> X9jO6tMkwKjcvnLlsNJy1J4=
> =ed0P
> -----END PGP SIGNATURE-----
>
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by: Microsoft
> Defy all challenges. Microsoft(R) Visual Studio 2008.
> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
> _______________________________________________
> Ganglia-general mailing list
> Ganglia-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/ganglia-general
> 



-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general

Reply via email to