I have issues with very slow requests a simple tree node cluster here, four WDC enterprise disks and Intel Optane NVMe journal on identical high memory nodes, with 10GB networking.

It was working all good with Ceph Hammer on Debian Wheezy, but I wanted to upgrade to a supported version and test out bluestore as well. So I upgraded to luminous on Debian Stretch and used ceph-volume to create bluestore osds, everything went downhill from there.

I went back to filestore on all nodes but I still have slow requests and I can not pinpoint a good reason I tried to debug and gathered information to look at:


First I thought it was the balancing that was making things slow, then I thought it might be the LVM layer, so I recreated the nodes without LVM by switching from ceph-volume to ceph-disk, no different still slow request. Then I changed back from bluestore to filestore but still the a very slow cluster. Then I thought it was a CPU scheduling issue and downgraded the 5.x kernel and CPU performance is full speed again. I thought maybe there is something weird with an osd and taking them out one by one, but slow request are still showing up and client performance from vms is really poor.

I just feel a burst of small requests keeps blocking for a while then recovers again.

Many thanks for helping out looking at the URL.

If there are options which I should tune for a hdd with nvme journal setup please share.

