Hello, 

I am currently running Luminous 12.2.8 on Ubuntu with 4.15.0-36-generic kernel 
from the official ubuntu repo. The cluster has 4 mon + osd servers. Each osd 
server has the total of 9 spinning osds and 1 ssd for the hdd and ssd pools. 
The hdds are backed by the S3710 ssds for journaling with a ration of 1:5. The 
ssd pool osds are not using external journals. Ceph is used as a Primary 
storage for Cloudstack - all vm disk images are stored on the cluster. 

I have recently migrated all osds to the bluestore, which was a long process 
with ups and downs, but I am happy to say that the migration is done. During 
the migration I've disabled the scrubbing (both deep and standard). After 
reenabling the scrubbing I have noticed the cluster started having a large 
number of slow requests and poor client IO (to the point of vms stall for 
minutes). Further investigation showed that the slow requests happen because of 
the osds flapping. In a single day my logs have over 1000 entries which report 
osd going down. This effects random osds. Disabling deep-scrubbing stabilises 
the cluster and the osds are no longer flap and the slow requests disappear. As 
a short term solution I've disabled the deepscurbbing, but was hoping to fix 
the issues with your help. 

At the moment, I am running the cluster with default settings apart from the 
following settings: 

[global] 
osd_disk_thread_ioprio_priority = 7 
osd_disk_thread_ioprio_class = idle 
osd_recovery_op_priority = 1 

[osd] 
debug_ms = 0 
debug_auth = 0 
debug_osd = 0 
debug_bluestore = 0 
debug_bluefs = 0 
debug_bdev = 0 
debug_rocksdb = 0 


Could you share experiences with deep scrubbing of bluestore osds? Are there 
any options that I should set to make sure the osds are not flapping and the 
client IO is still available? 

Thanks 

Andrei 
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to