I can't read assembly, so this doesn't mean much to me, but hopefully it'll
mean something to you :)
40540e: e9 fc fe ff ff jmpq 40530f <openlog@plt+0x242f>
405413: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
405418: 48 89 de mov %rbx,%rsi
40541b: e8 b0 fd ff ff callq 4051d0 <openlog@plt+0x22f0>
405420: 48 8b 7b 18 mov 0x18(%rbx),%rdi
405424: 48 85 ff test %rdi,%rdi
405427: 74 0d je 405436 <openlog@plt+0x2556>
405429: 4c 89 e2 mov %r12,%rdx
40542c: be 60 54 40 00 mov $0x405460,%esi
405431: e8 ca d3 ff ff callq 402800 <hash_foreach@plt>
405436: 31 c0 xor %eax,%eax
405438: e9 f8 fe ff ff jmpq 405335 <openlog@plt+0x2455>
40543d: 0f 1f 00 nopl (%rax)
405440: 31 c9 xor %ecx,%ecx
405442: 4c 89 ea mov %r13,%rdx
405445: 31 f6 xor %esi,%esi
405447: 4c 89 e7 mov %r12,%rdi
40544a: 4c 89 04 24 mov %r8,(%rsp)
40544e: e8 3d fe ff ff callq 405290 <openlog@plt+0x23b0>
405453: 4c 8b 04 24 mov (%rsp),%r8
405457: 89 c5 mov %eax,%ebp
405459: eb ab jmp 405406 <openlog@plt+0x2526>
40545b: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
405460: 48 89 6c 24 f0 mov %rbp,-0x10(%rsp)
405465: 4c 89 64 24 f8 mov %r12,-0x8(%rsp)
40546a: 49 89 fc mov %rdi,%r12
40546d: 48 89 5c 24 e8 mov %rbx,-0x18(%rsp)
405472: 48 83 ec 18 sub $0x18,%rsp
405476: 8b 7a 18 mov 0x18(%rdx),%edi
405479: 48 89 d5 mov %rdx,%rbp
40547c: 48 8b 1e mov (%rsi),%rbx
40547f: 85 ff test %edi,%edi
405481: 74 0c je 40548f <openlog@plt+0x25af>
405483: 48 89 de mov %rbx,%rsi
405486: e8 15 fd ff ff callq 4051a0 <openlog@plt+0x22c0>
40548b: 85 c0 test %eax,%eax
40548d: 74 12 je 4054a1 <openlog@plt+0x25c1>
40548f: 31 c9 xor %ecx,%ecx
405491: 48 89 ea mov %rbp,%rdx
405494: 4c 89 e6 mov %r12,%rsi
405497: 48 89 df mov %rbx,%rdi
40549a: ff 53 08 callq *0x8(%rbx)
40549d: 85 c0 test %eax,%eax
40549f: 74 1f je 4054c0 <openlog@plt+0x25e0>
4054a1: b8 01 00 00 00 mov $0x1,%eax
4054a6: 48 8b 1c 24 mov (%rsp),%rbx
4054aa: 48 8b 6c 24 08 mov 0x8(%rsp),%rbp
4054af: 4c 8b 64 24 10 mov 0x10(%rsp),%r12
4054b4: 48 83 c4 18 add $0x18,%rsp
4054b8: c3 retq
4054b9: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
4054c0: 48 89 ef mov %rbp,%rdi
4054c3: 48 89 de mov %rbx,%rsi
4054c6: e8 05 fd ff ff callq 4051d0 <openlog@plt+0x22f0>
4054cb: 48 8b 7b 18 mov 0x18(%rbx),%rdi
On Tue, Sep 16, 2014 at 12:45 PM, Devon H. O'Dell <devon.od...@gmail.com>
wrote:
> If you can install the dbg or dbgsym package for this, you can get
> more information. If you cannot do this, running:
>
> objdump -d `which gmond` | less
>
> in less:
>
> /40547c
>
> Paste a little context of the disassembly before and after that
> address, then scroll up and paste which function it's in. (That might
> still be too little information or even bad information if the binary
> is stripped. But it's something.)
>
> --dho
>
> 2014-09-14 18:09 GMT-07:00 Sam Barham <s.bar...@adinstruments.com>:
> > I've finally managed to generate a core dump (the VM wasn't set up to do
> it
> > yet), but it's 214Mb and doesn't seem to contain anything helpful -
> > especially as I don't have debug symbols. The backtrace shows:
> > #0 0x000000000040547c in ?? ()
> > #1 0x00007f600a49a245 in hash_foreach () from
> > /usr/lib/libganglia-3.3.8.so.0
> > #2 0x00000000004054e1 in ?? ()
> > #3 0x00007f600a49a245 in hash_foreach () from
> > /usr/lib/libganglia-3.3.8.so.0
> > #4 0x00000000004054e1 in ?? ()
> > #5 0x00007f600a49a245 in hash_foreach () from
> > /usr/lib/libganglia-3.3.8.so.0
> > #6 0x0000000000405436 in ?? ()
> > #7 0x000000000040530d in ?? ()
> > #8 0x00000000004058fa in ?? ()
> > #9 0x00007f6008ef9b50 in start_thread () from
> > /lib/x86_64-linux-gnu/libpthread.so.0
> > #10 0x00007f6008c43e6d in clone () from /lib/x86_64-linux-gnu/libc.so.6
> > #11 0x0000000000000000 in ?? ()
> >
> > Is there a way for me to get more useful information out of it?
> >
> > On Fri, Sep 12, 2014 at 10:11 AM, Devon H. O'Dell <devon.od...@gmail.com
> >
> > wrote:
> >>
> >> Are you able to share a core file?
> >>
> >> 2014-09-11 14:32 GMT-07:00 Sam Barham <s.bar...@adinstruments.com>:
> >> > We are using Ganglia to monitoring our cloud infrastructure on Amazon
> >> > AWS.
> >> > Everything is working correctly (metrics are flowing etc), except that
> >> > occasionally the gmetad process will segfault out of the blue. The
> >> > gmetad
> >> > process is running on an m3.medium EC2, and is monitoring about 50
> >> > servers.
> >> > The servers are arranged into groups, each one having a bastion EC2
> >> > where
> >> > the metrics are gathered. gmetad is configured to grab the metrics
> from
> >> > those bastions - about 10 of them.
> >> >
> >> > Some useful facts:
> >> >
> >> > We are running Debian Wheezy on all the EC2s
> >> > Sometimes the crash will happen multiple times in a day, sometimes
> it'll
> >> > be
> >> > a day or two before it crashes
> >> > The crash creates no logs in normal operation other than a segfault
> log
> >> > something like "gmetad[11291]: segfault at 71 ip 000000000040547c sp
> >> > 00007ff2d6572260 error 4 in gmetad[400000+e000]". If we run gmetad
> >> > manually
> >> > with debug logging, it appears that the crash is related to gmetad
> doing
> >> > a
> >> > cleanup.
> >> > When we realised that the cleanup process might be to blame we did
> more
> >> > research around that. We realised that our disk IO was way too high
> and
> >> > added rrdcached in order to reduce it. The disk IO is now much lower,
> >> > and
> >> > the crash is occurring less often, but still an average of once a day
> or
> >> > so.
> >> > We have two systems (dev and production). Both exhibit this crash, but
> >> > the
> >> > dev system, which is monitoring a much smaller group of servers
> crashes
> >> > significantly less often.
> >> > The production system is running ganglia 3.3.8-1+nmu1/rrdtool 1.4.7-2.
> >> > We've
> >> > upgraded ganglia in the dev systems to ganglia 3.6.0-2~bpo70+1/rrdtool
> >> > 1.4.7-2. That doesn't seem to have helped with the crash.
> >> > We have monit running on both systems configured to restart gmetad if
> it
> >> > dies. It restarts immediately with no issues.
> >> > The production system is storing it's data on a magnetic disk, the dev
> >> > system is using ssd. That doesn't appear to have changed the
> frequency
> >> > of
> >> > the crash.
> >> >
> >> > Has anyone experienced this kind of crash, especially on Amazon
> >> > hardware?
> >> > We're at our wits end trying to find a solution!
> >> >
> >> >
> >> >
> >> >
> ------------------------------------------------------------------------------
> >> > Want excitement?
> >> > Manually upgrade your production database.
> >> > When you want reliability, choose Perforce
> >> > Perforce version control. Predictably reliable.
> >> >
> >> >
> http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
> >> > _______________________________________________
> >> > Ganglia-general mailing list
> >> > Ganglia-general@lists.sourceforge.net
> >> > https://lists.sourceforge.net/lists/listinfo/ganglia-general
> >> >
> >
> >
> >
> >
> ------------------------------------------------------------------------------
> > Want excitement?
> > Manually upgrade your production database.
> > When you want reliability, choose Perforce
> > Perforce version control. Predictably reliable.
> >
> http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
> > _______________________________________________
> > Ganglia-general mailing list
> > Ganglia-general@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/ganglia-general
> >
>
On Tue, Sep 16, 2014 at 12:45 PM, Devon H. O'Dell <devon.od...@gmail.com>
wrote:
> If you can install the dbg or dbgsym package for this, you can get
> more information. If you cannot do this, running:
>
> objdump -d `which gmond` | less
>
> in less:
>
> /40547c
>
> Paste a little context of the disassembly before and after that
> address, then scroll up and paste which function it's in. (That might
> still be too little information or even bad information if the binary
> is stripped. But it's something.)
>
> --dho
>
> 2014-09-14 18:09 GMT-07:00 Sam Barham <s.bar...@adinstruments.com>:
> > I've finally managed to generate a core dump (the VM wasn't set up to do
> it
> > yet), but it's 214Mb and doesn't seem to contain anything helpful -
> > especially as I don't have debug symbols. The backtrace shows:
> > #0 0x000000000040547c in ?? ()
> > #1 0x00007f600a49a245 in hash_foreach () from
> > /usr/lib/libganglia-3.3.8.so.0
> > #2 0x00000000004054e1 in ?? ()
> > #3 0x00007f600a49a245 in hash_foreach () from
> > /usr/lib/libganglia-3.3.8.so.0
> > #4 0x00000000004054e1 in ?? ()
> > #5 0x00007f600a49a245 in hash_foreach () from
> > /usr/lib/libganglia-3.3.8.so.0
> > #6 0x0000000000405436 in ?? ()
> > #7 0x000000000040530d in ?? ()
> > #8 0x00000000004058fa in ?? ()
> > #9 0x00007f6008ef9b50 in start_thread () from
> > /lib/x86_64-linux-gnu/libpthread.so.0
> > #10 0x00007f6008c43e6d in clone () from /lib/x86_64-linux-gnu/libc.so.6
> > #11 0x0000000000000000 in ?? ()
> >
> > Is there a way for me to get more useful information out of it?
> >
> > On Fri, Sep 12, 2014 at 10:11 AM, Devon H. O'Dell <devon.od...@gmail.com
> >
> > wrote:
> >>
> >> Are you able to share a core file?
> >>
> >> 2014-09-11 14:32 GMT-07:00 Sam Barham <s.bar...@adinstruments.com>:
> >> > We are using Ganglia to monitoring our cloud infrastructure on Amazon
> >> > AWS.
> >> > Everything is working correctly (metrics are flowing etc), except that
> >> > occasionally the gmetad process will segfault out of the blue. The
> >> > gmetad
> >> > process is running on an m3.medium EC2, and is monitoring about 50
> >> > servers.
> >> > The servers are arranged into groups, each one having a bastion EC2
> >> > where
> >> > the metrics are gathered. gmetad is configured to grab the metrics
> from
> >> > those bastions - about 10 of them.
> >> >
> >> > Some useful facts:
> >> >
> >> > We are running Debian Wheezy on all the EC2s
> >> > Sometimes the crash will happen multiple times in a day, sometimes
> it'll
> >> > be
> >> > a day or two before it crashes
> >> > The crash creates no logs in normal operation other than a segfault
> log
> >> > something like "gmetad[11291]: segfault at 71 ip 000000000040547c sp
> >> > 00007ff2d6572260 error 4 in gmetad[400000+e000]". If we run gmetad
> >> > manually
> >> > with debug logging, it appears that the crash is related to gmetad
> doing
> >> > a
> >> > cleanup.
> >> > When we realised that the cleanup process might be to blame we did
> more
> >> > research around that. We realised that our disk IO was way too high
> and
> >> > added rrdcached in order to reduce it. The disk IO is now much lower,
> >> > and
> >> > the crash is occurring less often, but still an average of once a day
> or
> >> > so.
> >> > We have two systems (dev and production). Both exhibit this crash, but
> >> > the
> >> > dev system, which is monitoring a much smaller group of servers
> crashes
> >> > significantly less often.
> >> > The production system is running ganglia 3.3.8-1+nmu1/rrdtool 1.4.7-2.
> >> > We've
> >> > upgraded ganglia in the dev systems to ganglia 3.6.0-2~bpo70+1/rrdtool
> >> > 1.4.7-2. That doesn't seem to have helped with the crash.
> >> > We have monit running on both systems configured to restart gmetad if
> it
> >> > dies. It restarts immediately with no issues.
> >> > The production system is storing it's data on a magnetic disk, the dev
> >> > system is using ssd. That doesn't appear to have changed the
> frequency
> >> > of
> >> > the crash.
> >> >
> >> > Has anyone experienced this kind of crash, especially on Amazon
> >> > hardware?
> >> > We're at our wits end trying to find a solution!
> >> >
> >> >
> >> >
> >> >
> ------------------------------------------------------------------------------
> >> > Want excitement?
> >> > Manually upgrade your production database.
> >> > When you want reliability, choose Perforce
> >> > Perforce version control. Predictably reliable.
> >> >
> >> >
> http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
> >> > _______________________________________________
> >> > Ganglia-general mailing list
> >> > Ganglia-general@lists.sourceforge.net
> >> > https://lists.sourceforge.net/lists/listinfo/ganglia-general
> >> >
> >
> >
> >
> >
> ------------------------------------------------------------------------------
> > Want excitement?
> > Manually upgrade your production database.
> > When you want reliability, choose Perforce
> > Perforce version control. Predictably reliable.
> >
> http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
> > _______________________________________________
> > Ganglia-general mailing list
> > Ganglia-general@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/ganglia-general
> >
>
------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce.
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
_______________________________________________
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general