This may be less of an issue now - the most traumatic experience for us, back around hammer, memory usage under recovery+load ended up with OOM kill of osds, needing more recovery, a pretty vicious cycle.
-KJ On Wed, Nov 14, 2018 at 11:45 AM Vladimir Brik < vladimir.b...@icecube.wisc.edu> wrote: > Hello > > I have a ceph 13.2.2 cluster comprised of 5 hosts, each with 16 HDDs and > 4 SSDs. HDD OSDs have about 50 PGs each, while SSD OSDs have about 400 > PGs each (a lot more pools use SSDs than HDDs). Servers are fairly > powerful: 48 HT cores, 192GB of RAM, and 2x25Gbps Ethernet. > > The impression I got from the docs is that having more than 200 PGs per > OSD is not a good thing, but justifications were vague (no concrete > numbers), like increased peering time, increased resource consumption, > and possibly decreased recovery performance. None of these appeared to > be a significant problem in my testing, but the tests were very basic > and done on a pretty empty cluster under minimal load, so I worry I'll > run into trouble down the road. > > Here are the questions I have: > - In practice, is it a big deal that some OSDs have ~400 PGs? > - In what situations would our cluster most likely fare significantly > better if I went through the trouble of re-creating pools so that no OSD > would have more than, say, ~100 PGs? > - What performance metrics could I monitor to detect possible issues due > to having too many PGs? > > Thanks, > > Vlad > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- Kjetil Joergensen <kje...@medallia.com> SRE, Medallia Inc
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com