There was a time in the history of Ceph where a weight of 0.0 was not
always what you thought.  People had better experiences with crush weights
of something like 0.0001 or something.  This is just a memory tickling in
the back of my mind of things I've read on the ML years back.

On Fri, May 11, 2018 at 1:26 PM Bryan Stillwell <bstillw...@godaddy.com>
wrote:

> > We have a large 1PB ceph cluster. We recently added 6 nodes with 16 2TB
> disks
> > each to the cluster. All the 5 nodes rebalanced well without any issues
> and
> > the sixth/last node OSDs started acting weird as I increase weight of
> one osd
> > the utilization doesn't change but a different osd on the same node
> > utilization is getting increased. Rebalance complete fine but
> utilization is
> > not right.
> >
> > Increased weight of OSD 610 to 0.2 from 0.0 but utilization of OSD 611
> > started increasing but its weight is 0.0. If I increase weight of OSD
> 611 to
> > 0.2 then its overall utilization is growing to what if its weight is
> 0.4. So
> > if I increase weight of 610 and 615 to their full weight then
> utilization on
> > OSD 610 is 1% and on OSD 611 is inching towards 100% where I had to stop
> and
> > downsize the OSD's crush weight back to 0.0 to avoid any implications on
> ceph
> > cluster. Its not just one osd but different OSD's on that one node. The
> only
> > correlation I found out is 610 and 611 OSD Journal partitions are on the
> same
> > SSD drive and all the OSDs are SAS drives. Any help on how to debug or
> > resolve this will be helpful.
>
> You didn't say which version of Ceph you were using, but based on the
> output
> of 'ceph osd df' I'm guessing it's pre-Jewel (maybe Hammer?) cluster?
>
> I've found that data placement can be a little weird when you have really
> low CRUSH weights (0.2) on one of the nodes where the other nodes have
> large
> CRUSH weights (2.0).  I've had it where a single OSD in a node was getting
> almost all the data.  It wasn't until I increased the weights to be more in
> line with the rest of the cluster that it evened back out.
>
> I believe this can also be caused by not having enough PGs in your cluster.
> Or the PGs you do have aren't distributed correctly based on the data usage
> in each pool.  Have you used https://ceph.com/pgcalc/ to determine the
> correct number of PGs you should have per pool?
>
> Since you are likely running a pre-Jewel cluster it could also be that you
> haven't switched your tunables to use the straw2 data placement algorithm:
>
>
> http://docs.ceph.com/docs/master/rados/operations/crush-map/#hammer-crush-v4
>
> That should help as well.  Once that's enabled you can convert your
> existing
> buckets to straw2 as well.  Just be careful you don't have any old clients
> connecting to your cluster that don't support that feature yet.
>
> Bryan
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to