Re: [ceph-users] 12.2.4 Both Ceph MDS nodes crashed. Please help.

Sean Sullivan Tue, 01 May 2018 16:20:07 -0700

Forgot to reply to all:

Sure thing!


I couldn't install the ceph-mds-dbg packages without upgrading. I just
finished upgrading the cluster to 12.2.5. The issue still persists in 12.2.5

>From here I'm not really sure how to do generate the backtrace so I hope I
did it right. For others on Ubuntu this is what I did:

* firstly up the debug_mds to 20 and debug_ms to 1:
ceph tell mds.* injectargs '--debug-mds 20 --debug-ms 1'

* install the debug packages
ceph-mds-dbg in my case

* I also added these options to /etc/ceph/ceph.conf just in case they
restart.

* Now allow pids to dump (stolen partly from redhat docs and partly from
ubuntu)
echo -e 'DefaultLimitCORE=infinity\nPrivateTmp=true' | tee -a
/etc/systemd/system.conf
sysctl fs.suid_dumpable=2
sysctl kernel.core_pattern=/tmp/core
systemctl daemon-reload
systemctl restart ceph-mds@$(hostname -s)

* A crash was created in /var/crash by apport but gdb cant read it. I used
apport-unpack and then ran GDB on what is inside:

apport-unpack /var/crash/$(ls /var/crash/*mds*) /root/crash_dump/
cd /root/crash_dump/
gdb $(cat ExecutablePath) CoreDump -ex 'thr a a bt' | tee
/root/ceph_mds_$(hostname -s)_backtrace

* This left me with the attached backtraces (which I think are wrong as I
see a lot of ?? yet gdb says /usr/lib/debug/.build-id/1d/
23dc5ef4fec1dacebba2c6445f05c8fe6b8a7c.debug was loaded)

 kh10-8 mds backtrace -- https://pastebin.com/bwqZGcfD
 kh09-8 mds backtrace -- https://pastebin.com/vvGiXYVY


The log files are pretty large (one 4.1G and the other 200MB)

kh10-8 (200MB) mds log -- https://griffin-objstore.
opensciencedatacloud.org/logs/ceph-mds.kh10-8.log
kh09-8 (4.1GB) mds log -- https://griffin-objstore.
opensciencedatacloud.org/logs/ceph-mds.kh09-8.log

On Tue, May 1, 2018 at 12:09 AM, Patrick Donnelly <[email protected]>
wrote:

> Hello Sean,
>
> On Mon, Apr 30, 2018 at 2:32 PM, Sean Sullivan <[email protected]>
> wrote:
> > I was creating a new user and mount point. On another hardware node I
> > mounted CephFS as admin to mount as root. I created /aufstest and then
> > unmounted. From there it seems that both of my mds nodes crashed for some
> > reason and I can't start them any more.
> >
> > https://pastebin.com/1ZgkL9fa -- my mds log
> >
> > I have never had this happen in my tests so now I have live data here. If
> > anyone can lend a hand or point me in the right direction while
> > troubleshooting that would be a godsend!
>
> Thanks for keeping the list apprised of your efforts. Since this is so
> easily reproduced for you, I would suggest that you next get higher
> debug logs (debug_mds=20/debug_ms=1) from the MDS. And, since this is
> a segmentation fault, a backtrace with debug symbols from gdb would
> also be helpful.
>
> --
> Patrick Donnelly
>

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] 12.2.4 Both Ceph MDS nodes crashed. Please help.

Reply via email to