Hi all,


Some bluestore OSDs in our Luminous test cluster have started becoming 
unresponsive and booting very slowly.



These OSDs have been used for stress testing for hardware destined for our 
production cluster, so have had a number of pools on them with many, many 
objects in the past. All these pools have since been deleted.



When booting the OSDs, they spend a few minutes *per PG* in clear_temp_objects 
function, even for brand new, empty PGs. The OSD is hammering the disk during 
the clear_temp_objects, with a constant ~30MB/s read and all available IOPS 
consumed. The OSD will finish booting and come up fine, but will then start 
hammering the disk again and fall over at some point later, causing the cluster 
to gradually fall apart. I'm guessing something is 'not optimal' in the rocksDB.



Deleting all pools will stop this behaviour and OSDs without PGs will reboot 
quickly and stay up, but creating a pool will cause OSDs that get even a single 
PG to start exhibiting this behaviour again.



These are HDD OSDs, with WAL and rocksDB on disk. I would guess they are ~1yr 
old. Upgrading to 12.2.12 did not change this behaviour. A blueFS export of a 
problematic OSD's block device reveals a 1.5GB rocksDB (L0 - 63.80 KB, L1 - 
62.39 MB,  L2 - 116.46 MB,  L3 - 1.38 GB), which seems excessive for an empty 
OSD, but it's also the first time I've looked into this so may be normal?



Destroying and recreating an OSD resolves the issue for that OSD, which is 
acceptable for this cluster, but I'm a little concerned a similar thing could 
happen on a production cluster. Ideally, I would like to try and understand 
what has happened before recreating the problematic OSDs.



Has anyone got any thoughts on what might have happened, or tips on how to dig 
further into this?


Cheers,
Tom

Tom Byrne
Storage System Administrator
Scientific Computing Department
Science and Technology Facilities Council
Rutherford Appleton Laboratory
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to