Hi all,

We have recently begun switching over to Bluestore on our Ceph cluster, 
currently on 12.2.7. We first began encountering segfaults on Bluestore during 
12.2.5, but strangely these segfaults apply exclusively to our SSD pools and 
not the PCIE/HDD disks. We upgraded to 12.2.7 last week to get clear of the 
issues known within 12.2.6 and hoping it may address our bluestore issues, but 
to no avail, and upgrading to mimic is not feasible for us right away as this 
is a production environment.

I have attached one of the OSD logs which are experiencing the segfault, as 
well as the recommended command to interpret the debug information. 
Unfortunately at present due to the 403 I am unable to open a bug tracker for 
this.

OSD Log: https://transfer.sh/AYQ8Y/ceph-osd.123.log
OSD Binary debug: https://transfer.sh/FOiLv/ceph-osd-123-binary.txt.tar.gz

The disks in use are Intel DC S3710s 800G.

These OSDs were previously filestore and fully operational, and the procedure 
for migrating these was to as usual mark as out, await recovery, zap and 
redeploy. We further used DD to ensure the disk was fully wiped and performed 
smartctl tests to rule out errors with the disk performance, but were unable to 
find any faults.

What may be unusual is only some of the SSDs are encountering this segfault so 
far. On one host where we have 8 OSDs, only 2 of these are hitting the 
segfaults so far. However, we have noticed the new OSDs are considerably more 
temperamental to be marked as down despite minimal load.

Any advice anyone could offer on this would be great.

Kind Regards,

Tom
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to