There was a time in the history of Ceph where a weight of 0.0 was not always what you thought. People had better experiences with crush weights of something like 0.0001 or something. This is just a memory tickling in the back of my mind of things I've read on the ML years back.
On Fri, May 11, 2018 at 1:26 PM Bryan Stillwell <bstillw...@godaddy.com> wrote: > > We have a large 1PB ceph cluster. We recently added 6 nodes with 16 2TB > disks > > each to the cluster. All the 5 nodes rebalanced well without any issues > and > > the sixth/last node OSDs started acting weird as I increase weight of > one osd > > the utilization doesn't change but a different osd on the same node > > utilization is getting increased. Rebalance complete fine but > utilization is > > not right. > > > > Increased weight of OSD 610 to 0.2 from 0.0 but utilization of OSD 611 > > started increasing but its weight is 0.0. If I increase weight of OSD > 611 to > > 0.2 then its overall utilization is growing to what if its weight is > 0.4. So > > if I increase weight of 610 and 615 to their full weight then > utilization on > > OSD 610 is 1% and on OSD 611 is inching towards 100% where I had to stop > and > > downsize the OSD's crush weight back to 0.0 to avoid any implications on > ceph > > cluster. Its not just one osd but different OSD's on that one node. The > only > > correlation I found out is 610 and 611 OSD Journal partitions are on the > same > > SSD drive and all the OSDs are SAS drives. Any help on how to debug or > > resolve this will be helpful. > > You didn't say which version of Ceph you were using, but based on the > output > of 'ceph osd df' I'm guessing it's pre-Jewel (maybe Hammer?) cluster? > > I've found that data placement can be a little weird when you have really > low CRUSH weights (0.2) on one of the nodes where the other nodes have > large > CRUSH weights (2.0). I've had it where a single OSD in a node was getting > almost all the data. It wasn't until I increased the weights to be more in > line with the rest of the cluster that it evened back out. > > I believe this can also be caused by not having enough PGs in your cluster. > Or the PGs you do have aren't distributed correctly based on the data usage > in each pool. Have you used https://ceph.com/pgcalc/ to determine the > correct number of PGs you should have per pool? > > Since you are likely running a pre-Jewel cluster it could also be that you > haven't switched your tunables to use the straw2 data placement algorithm: > > > http://docs.ceph.com/docs/master/rados/operations/crush-map/#hammer-crush-v4 > > That should help as well. Once that's enabled you can convert your > existing > buckets to straw2 as well. Just be careful you don't have any old clients > connecting to your cluster that don't support that feature yet. > > Bryan > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com