Hi,
I have a big-ish cluster that, amongst other things, has a radosgw
configured to have an EC data pool (k=12, m=4). The cluster is
currently running Jewel (10.2.7).
That pool spans 244 HDDs and has 2048 PGs.
from the df detail:
.rgw.buckets.ec 26 - N/A N/A
76360G 28.66 185T 97908947 95614k
73271k 185M 101813G
ct-radosgw 37 - N/A N/A
4708G 70.69 1952G 5226185 2071k
591M 1518M 9416G
The ct-radosgw should be size 3, but currently due to an unrelated
issue (pdu failure) is size 2.
Whenever I flush data from the cache tier to the base tier the OSDs
start updating their local leveldb database, using up 100% IO, until
they: a) are set as down for no answer, and/or b) suicide timeout.
I have other pools targeting those same OSDs but until now nothing has
happened when the IO goes to the other pools.
Any ideas on where to proceed?
thanks,
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com