When you lose 2 osds you have 30 osds accepting the degraded data and
performing the backfilling. When the 2 osds are added back in you only have
2 osds receiving the majority of the data from the backfilling.  2 osds
have a lot less available iops and spindle speed than the other 30 did when
they were recovering from the loss causing your bottleneck.

Adding osds is generally a slower operation than losing them due to this.
Even for brand-new nodes increasing your cluster size.

On Wed, Sep 27, 2017, 8:43 AM Jonas Jaszkowic <[email protected]>
wrote:

> Hello all,
>
> I have setup a Ceph cluster consisting of one monitor, 32 OSD hosts (1 OSD
> of size 320GB per host) and 16 clients which are reading
> and writing to the cluster. I have one erasure coded pool (shec plugin)
> with k=8, m=4, c=3 and pg_num=256. Failure domain is host.
> I am able to reach a HEALTH_OK state and everything is working as
> expected. The pool was populated with
> 114048 files of different sizes ranging from 1kB to 4GB. Total amount of
> data in the pool was around 3TB. The capacity of the
> pool was around 10TB.
>
> I want to evaluate how Ceph is rebalancing data when
>
> 1) I take out two OSDs and
> 2) when I rejoin these two OSDS.
>
> For scenario 1) I am „killing" two OSDs via *ceph osd out <osd-id>. *Ceph
> notices this failure and starts to rebalance data until I
> reach HEALTH_OK again.
>
> For scenario 2) I am rejoining the previously killed OSDs via *ceph osd
> in <osd-id>. *Again, Ceph notices this failure and starts to
> rebalance data until HEALTH_OK state.
>
> I repeated this whole scenario four times. *What I am noticing is that
> the rebalancing process in the event of two OSDs joining the*
> *cluster takes more than 3 times longer than in the event of the loss of
> two OSDs. This was consistent over the four runs.*
>
> I expected both recovering times to be equally long since at both
> scenarios the number of degraded objects was around 8% and the
> number of missing objects around 2%. I attached a visualization of the
> recovery process in terms of degraded and missing objects,
> first part is the scenario where two OSDs „failed“, second one is the
> rejoining of these two OSDs. Note how it takes significantly longer
> to recover in the second case.
>
> Now I want to understand why it takes longer! I appreciate all hints.
>
> Thanks!
> _______________________________________________
> ceph-users mailing list
> [email protected]
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to