On Thu, Aug 30, 2018 at 5:24 AM, Wolfgang Lendl <wolfgang.le...@meduniwien.ac.at> wrote: > Hi Alfredo, > > > caught some logs: > https://pastebin.com/b3URiA7p
That looks like there is an issue with bluestore. Maybe Radoslaw or Adam might know a bit more. > > br > wolfgang > > On 2018-08-29 15:51, Alfredo Deza wrote: >> On Wed, Aug 29, 2018 at 2:06 AM, Wolfgang Lendl >> <wolfgang.le...@meduniwien.ac.at> wrote: >>> Hi, >>> >>> after upgrading my ceph clusters from 12.2.5 to 12.2.7 I'm experiencing >>> random crashes from SSD OSDs (bluestore) - it seems that HDD OSDs are not >>> affected. >>> I destroyed and recreated some of the SSD OSDs which seemed to help. >>> >>> this happens on centos 7.5 (different kernels tested) >>> >>> /var/log/messages: >>> Aug 29 10:24:08 ceph-osd: *** Caught signal (Segmentation fault) ** >>> Aug 29 10:24:08 ceph-osd: in thread 7f8a8e69e700 >>> thread_name:bstore_kv_final >>> Aug 29 10:24:08 kernel: traps: bstore_kv_final[187470] general protection >>> ip:7f8a997cf42b sp:7f8a8e69abc0 error:0 in >>> libtcmalloc.so.4.4.5[7f8a997a8000+46000] >>> Aug 29 10:24:08 systemd: ceph-osd@2.service: main process exited, >>> code=killed, status=11/SEGV >>> Aug 29 10:24:08 systemd: Unit ceph-osd@2.service entered failed state. >>> Aug 29 10:24:08 systemd: ceph-osd@2.service failed. >>> Aug 29 10:24:28 systemd: ceph-osd@2.service holdoff time over, scheduling >>> restart. >>> Aug 29 10:24:28 systemd: Starting Ceph object storage daemon osd.2... >>> Aug 29 10:24:28 systemd: Started Ceph object storage daemon osd.2. >>> Aug 29 10:24:28 ceph-osd: starting osd.2 at - osd_data >>> /var/lib/ceph/osd/ceph-2 /var/lib/ceph/osd/ceph-2/journal >>> Aug 29 10:24:35 ceph-osd: *** Caught signal (Segmentation fault) ** >>> Aug 29 10:24:35 ceph-osd: in thread 7f5f1e790700 thread_name:tp_osd_tp >>> Aug 29 10:24:35 kernel: traps: tp_osd_tp[186933] general protection >>> ip:7f5f43103e63 sp:7f5f1e78a1c8 error:0 in >>> libtcmalloc.so.4.4.5[7f5f430cd000+46000] >>> Aug 29 10:24:35 systemd: ceph-osd@0.service: main process exited, >>> code=killed, status=11/SEGV >>> Aug 29 10:24:35 systemd: Unit ceph-osd@0.service entered failed state. >>> Aug 29 10:24:35 systemd: ceph-osd@0.service failed >> These systemd messages aren't usually helpful, try poking around >> /var/log/ceph/ for the output on that one OSD. >> >> If those logs aren't useful either, try bumping up the verbosity (see >> http://docs.ceph.com/docs/master/rados/troubleshooting/log-and-debug/#boot-time >> ) >>> did I hit a known issue? >>> any suggestions are highly appreciated >>> >>> >>> br >>> wolfgang >>> >>> >>> >>> _______________________________________________ >>> ceph-users mailing list >>> ceph-users@lists.ceph.com >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> > > -- > Wolfgang Lendl > IT Systems & Communications > Medizinische Universität Wien > Spitalgasse 23 / BT 88 /Ebene 00 > A-1090 Wien > Tel: +43 1 40160-21231 > Fax: +43 1 40160-921200 > > _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com