matt

thanks for the directions....the one additional thing I needed to do was
turn set " no_setuid  on" since the ganglia uid couldn't core dump.

i'll post the backtrace to the developers list. thanks for the help. no
hassle, i'm interested in what at the bottom of this.

russell



On Tue, 8 Jul 2003 at 17:43, matt massie wrote:

> russell-
>
> another trick for debugging the problem is to examine the core dump.  when
> you run...
>
> % ulimit -a
> core file size        (blocks, -c) unlimited
> data seg size         (kbytes, -d) unlimited
> file size             (blocks, -f) unlimited
> max locked memory     (kbytes, -l) unlimited
> max memory size       (kbytes, -m) unlimited
> open files                    (-n) 1024
> pipe size          (512 bytes, -p) 8
> stack size            (kbytes, -s) 8192
> cpu time             (seconds, -t) unlimited
> max user processes            (-u) 2047
> virtual memory        (kbytes, -v) unlimited
>
>
> you'll get your current user resource limitations.  if you have the
> permissions (e.g. you are root or your admin lets users change their
> limits), you can change the core file size to unlimited.
>
> % ulimit -c unlimited
>
> this will allow programs to dump a core file when they segfault.  that
> core file is very helpful.  you can use a debugger (gdb) to find exactly
> what the program was doing when it crashed.
>
> for example, say you run the following...
>
> % ./gmond
> ...
> <SEGFAULT...CRASH...BOOM...BANG>
>
> % ls core*
> core.12345
>
> % gdb --core=./core.12345 ./gmond
> GNU gdb Red Hat Linux (5.2.1-4)
> Copyright 2002 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and you
> are welcome to change it and/or distribute copies of it under certain 
> conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB.  Type "show warranty" for details.
> This GDB was configured as "i386-redhat-linux".
> (gdb)
>
> you are now inside the debugger... and have a (gdb) prompt.
>
> (gdb) bt <enter>
>
> will show a backtrack of exactly what the program was doing when it
> croaked.
>
> (gdb) quit <enter>
>
> will exit the debugger.
>
> if you cut and paste a backtrace to the developers list, we will certainly
> know what is going on.
>
> sorry for the hassle.  i'm sure will get this problem fixed.
>
> -matt
>
>
>
> Today, Russell Nordquist wrote forth saying...
>
> > From: Russell Nordquist <[EMAIL PROTECTED]>
> > To: steven wagner <[EMAIL PROTECTED]>
> > Cc: ganglia-general@lists.sourceforge.net
> > Date: Tue, 08 Jul 2003 17:21:08 -0500 (CDT)
> > Subject: Re: [Ganglia-general] gmond dying
> >
> > On Tue, 8 Jul 2003 at 14:50, steven wagner wrote:
> >
> > > I have no specific solutions for you but here are some potentially
> > > helpful tidbits which may permit you to shoot your own trouble:
> > >
> > > Does the monitoring core die right away?
> > > Does it dump core?
> > > Does it die when you run it in debug mode?
> > > Does debug mode tell you anything more about the error?
> > > Do other versions of the monitoring core exhibit this behavior?
> >
> > I turned the debugging up and:
> >
> > host:~# gmond
> > /etc/gmond.conf configuration
> > name is Octopod
> > owner is unspecified
> > latlong is unspecified
> > Cluster URL is unspecified
> > Host location is (x,y,z): unspecified
> > mcast_channel is 239.2.11.71
> > mcast_port is 8649
> > mcast_if is eth1
> > mcast_ttl is 1
> > mcast_threads is 2
> > xml_port is 8649
> > xml_threads is 2
> > trusted hosts are: 128.135.28.150
> >
> > num_nodes is 4
> > num_custom_metrics is 16
> > mute is 0
> > deaf is 0
> > debug_level is 10
> > no_setuid is 0
> > setuid is ganglia
> > no_gexec is 0
> > all_trusted is 0
> > pthread_attr_init
> > creating cluster hash for 4 nodes
> > hash_create size = 4
> > hash->size is 5
> > gmond initialized cluster hash
> > Using multicast-enabled interface eth1
> > mcast listening on 239.2.11.71 8649
> > Segmentation fault
> >
> > running strace really wasn't very enlightening either. I am using this
> > version on another multihomed host w/o any problmes......
> >
> >
> > >
> > > You may also want to go through the changelog, a few versions ago I seem
> > > to recall some dnet trouble concerning multiple interfaces.  My memory
> > > could well be faulty in this instance as I've been focused on other
> > > projects for the last few months...
> >
> > I didn't see anything.
> >
> > russell
> >
> > >
> > > Russell Nordquist wrote:
> > > > I have a strange issue with gmond dying immediatly. It's a multihomed
> > > > host. It starts fine with the mcast_if is not set, but binds to the
> > > > external NIC. when I add mcast_if eth1 it wont start. I added the
> > > > appropriate route as descibed in the docs, but still nothing.
> > > >
> > > > Here's my setup:
> > > >
> > > > ifconfig:
> > > > eth0      Link encap:Ethernet  HWaddr 00:04:75:EB:75:15
> > > >           inet addr:a.b.c.46  Bcast:a.b.c.255
> > > > Mask:255.255.255.0
> > > >           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
> > > >           RX packets:16538 errors:0 dropped:0 overruns:1 frame:0
> > > >           TX packets:3175 errors:0 dropped:0 overruns:0 carrier:0
> > > >           collisions:0 txqueuelen:100
> > > >           RX bytes:1615582 (1.5 MiB)  TX bytes:689513 (673.3 KiB)
> > > >           Interrupt:5 Base address:0x1000
> > > >
> > > > eth1      Link encap:Ethernet  HWaddr 00:E0:81:25:AD:E0
> > > >           inet addr:192.168.1.100  Bcast:192.168.1.255  
> > > > Mask:255.255.255.0
> > > >           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
> > > >           RX packets:3806 errors:0 dropped:0 overruns:1 frame:0
> > > >           TX packets:2836 errors:0 dropped:0 overruns:0 carrier:0
> > > >           collisions:0 txqueuelen:100
> > > >           RX bytes:678293 (662.3 KiB)  TX bytes:279688 (273.1 KiB)
> > > >           Interrupt:10 Base address:0x3000
> > > >
> > > > lo        Link encap:Local Loopback
> > > >           inet addr:127.0.0.1  Mask:255.0.0.0
> > > >           UP LOOPBACK RUNNING  MTU:16436  Metric:1
> > > >           RX packets:19397 errors:0 dropped:0 overruns:0 frame:0
> > > >           TX packets:19397 errors:0 dropped:0 overruns:0 carrier:0
> > > >           collisions:0 txqueuelen:0
> > > >           RX bytes:2540312 (2.4 MiB)  TX bytes:2540312 (2.4 MiB)
> > > >
> > > > route:
> > > > Kernel IP routing table
> > > > Destination     Gateway         Genmask         Flags Metric Ref    Use
> > > > Iface
> > > > 239.2.11.71     0.0.0.0         255.255.255.255 UH    0      0        0
> > > > eth1
> > > > 192.168.1.0     0.0.0.0         255.255.255.0   U     0      0        0
> > > > eth1
> > > > a.b.c.0    0.0.0.0         255.255.255.0   U     0      0        0
> > > > eth0
> > > > 0.0.0.0         a.b.c.1    0.0.0.0         UG    0      0        0
> > > > eth0
> > > >
> > > > I am using the ganglia-monitor 2.5.0-3 .deb (debian testing)
> > > >
> > > > It was working once, but stopped sometime during the building of this
> > > > system it stopped.
> > > >
> > > > thanks
> > > >
> > > > russell
> > > >
> > > > - - - - - - - - - - - -
> > > > Russell Nordquist
> > > > UNIX Systems Administrator
> > > > Geophysical Sciences Computing
> > > > http://geosci.uchicago.edu/computing
> > > > NSIT, University of Chicago
> > > >  - - - - - - - - - - -
> > > >
> > > >
> > > >
> > > >
> > > > -------------------------------------------------------
> > > > This SF.Net email sponsored by: Parasoft
> > > > Error proof Web apps, automate testing & more.
> > > > Download & eval WebKing and get a free book.
> > > > www.parasoft.com/bulletproofapps
> > > > _______________________________________________
> > > > Ganglia-general mailing list
> > > > Ganglia-general@lists.sourceforge.net
> > > > https://lists.sourceforge.net/lists/listinfo/ganglia-general
> > >
> > >
> >
> > - - - - - - - - - - - -
> > Russell Nordquist
> > UNIX Systems Administrator
> > Geophysical Sciences Computing
> > http://geosci.uchicago.edu/computing
> > NSIT, University of Chicago
> >  - - - - - - - - - - -
> >
> >
> >
> > -------------------------------------------------------
> > This SF.Net email sponsored by: Parasoft
> > Error proof Web apps, automate testing & more.
> > Download & eval WebKing and get a free book.
> > www.parasoft.com/bulletproofapps
> > _______________________________________________
> > Ganglia-general mailing list
> > Ganglia-general@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/ganglia-general
> >
>

- - - - - - - - - - - -
Russell Nordquist
UNIX Systems Administrator
Geophysical Sciences Computing
http://geosci.uchicago.edu/computing
NSIT, University of Chicago
 - - - - - - - - - - -


Reply via email to