We hit an OSD_FULL last week on our cluster - with an average utillzation
of less than 50% .. thus hugely imbalanced. This has driven us to
go for adjusting pg's upwards and reweighting the osd's more agressively.
Question: What do people see as an "acceptable" variance across OSD's?
x <stdin>
N Min Max Median Avg Stddev
x 72 45.49 56.25 52.35 51.878889 2.1764343
72 x 10TB drives. It seems hard to get further down -- thus churn will
most likely make it hard for us to stay at this level.
Currently we have ~158 PGs / OSD .. which by my math gives 63GB/pg if they
were fully utillzing the disk - which leads me to think that somewhat
smaller pg's would give the balancing an easier job. Would to be ok to
go to closer to 300 PGs/OSD ? - would it be sane?
I can see that the default max is 300, but I have hard time finding out
if this is "recommendable" or just a "tunable".
* We've now seen OSD_FULL trigger irrecoverable kernel bugs on the
CephFS kernel client on our 4.15 kernels - multiple times - forced reboot
is the only way out. We're on the Ubuntu kernels .. I havent done the diff
to upstream (yet) and I dont intent to run our production cluster
disk-full anyware in the near future to test it out.
Please, paste your `ceph osd df tree` and `ceph osd dump | head -n 12`.
k
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com