[ceph-users] slow request and unresponsive kvm guests after upgrading ceph cluster and os, please help debugging

Jelle de Jong Mon, 06 Jan 2020 11:45:50 -0800

Hello everybody,

I have issues with very slow requests a simple tree node cluster here,four WDC enterprise disks and Intel Optane NVMe journal on identicalhigh memory nodes, with 10GB networking.

It was working all good with Ceph Hammer on Debian Wheezy, but I wantedto upgrade to a supported version and test out bluestore as well. So Iupgraded to luminous on Debian Stretch and used ceph-volume to createbluestore osds, everything went downhill from there.

I went back to filestore on all nodes but I still have slow requests andI can not pinpoint a good reason I tried to debug and gatheredinformation to look at:


https://paste.debian.net/hidden/acc5d204/

First I thought it was the balancing that was making things slow, then Ithought it might be the LVM layer, so I recreated the nodes without LVMby switching from ceph-volume to ceph-disk, no different still slowrequest. Then I changed back from bluestore to filestore but still the avery slow cluster. Then I thought it was a CPU scheduling issue anddowngraded the 5.x kernel and CPU performance is full speed again. Ithought maybe there is something weird with an osd and taking them outone by one, but slow request are still showing up and client performancefrom vms is really poor.

I just feel a burst of small requests keeps blocking for a while thenrecovers again.


Many thanks for helping out looking at the URL.

If there are options which I should tune for a hdd with nvme journalsetup please share.


Jelle
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] slow request and unresponsive kvm guests after upgrading ceph cluster and os, please help debugging

Reply via email to