Re: [ceph-users] recovery process stops

Leszek Master Mon, 20 Oct 2014 13:32:22 -0700

You can set lower weight on full osds, or try changing the
osd_near_full_ratio parameter in your cluster from 85 to for example 89.
But i don't know what can go wrong when you do that.


2014-10-20 17:12 GMT+02:00 Wido den Hollander <[email protected]>:

> On 10/20/2014 05:10 PM, Harald Rößler wrote:
> > yes, tomorrow I will get the replacement of the failed disk, to get a
> new node with many disk will take a few days.
> > No other idea?
> >
>
> If the disks are all full, then, no.
>
> Sorry to say this, but it came down to poor capacity management. Never
> let any disk in your cluster fill over 80% to prevent these situations.
>
> Wido
>
> > Harald Rößler
> >
> >
> >> Am 20.10.2014 um 16:45 schrieb Wido den Hollander <[email protected]>:
> >>
> >> On 10/20/2014 04:43 PM, Harald Rößler wrote:
> >>> Yes, I had some OSD which was near full, after that I tried to fix the
> problem with "ceph osd reweight-by-utilization", but this does not help.
> After that I set the near full ratio to 88% with the idea that the
> remapping would fix the issue. Also a restart of the OSD doesn’t help. At
> the same time I had a hardware failure of on disk. :-(. After that failure
> the recovery process start at "degraded ~ 13%“ and stops at 7%.
> >>> Honestly I am scared in the moment I am doing the wrong operation.
> >>>
> >>
> >> Any chance of adding a new node with some fresh disks? Seems like you
> >> are operating on the storage capacity limit of the nodes and that your
> >> only remedy would be adding more spindles.
> >>
> >> Wido
> >>
> >>> Regards
> >>> Harald Rößler
> >>>
> >>>
> >>>
> >>>> Am 20.10.2014 um 14:51 schrieb Wido den Hollander <[email protected]>:
> >>>>
> >>>> On 10/20/2014 02:45 PM, Harald Rößler wrote:
> >>>>> Dear All
> >>>>>
> >>>>> I have in them moment a issue with my cluster. The recovery process
> stops.
> >>>>>
> >>>>
> >>>> See this: 2 active+degraded+remapped+backfill_toofull
> >>>>
> >>>> 156 pgs backfill_toofull
> >>>>
> >>>> You have one or more OSDs which are to full and that causes recovery
> to
> >>>> stop.
> >>>>
> >>>> If you add more capacity to the cluster recovery will continue and
> finish.
> >>>>
> >>>>> ceph -s
> >>>>>  health HEALTH_WARN 188 pgs backfill; 156 pgs backfill_toofull; 4
> pgs backfilling; 55 pgs degraded; 49 pgs recovery_wait; 297 pgs stuck
> unclean; recovery 111487/1488290 degraded (7.491%)
> >>>>>  monmap e2: 3 mons at {0=
> 10.99.10.10:6789/0,12=10.99.10.22:6789/0,6=10.99.10.16:6789/0}, election
> epoch 332, quorum 0,1,2 0,12,6
> >>>>>  osdmap e6748: 24 osds: 23 up, 23 in
> >>>>>   pgmap v43314672: 3328 pgs: 3031 active+clean, 43
> active+remapped+wait_backfill, 3 active+degraded+wait_backfill, 96
> active+remapped+wait_backfill+backfill_toofull, 31 active+recovery_wait, 19
> active+degraded+wait_backfill+backfill_toofull, 36 active+remapped, 3
> active+remapped+backfilling, 18 active+remapped+backfill_toofull, 6
> active+degraded+remapped+wait_backfill, 15 active+recovery_wait+remapped,
> 21 active+degraded+remapped+wait_backfill+backfill_toofull, 1
> active+recovery_wait+degraded, 1 active+degraded+remapped+backfilling, 2
> active+degraded+remapped+backfill_toofull, 2
> active+recovery_wait+degraded+remapped; 1698 GB data, 5206 GB used, 971 GB
> / 6178 GB avail; 24382B/s rd, 12411KB/s wr, 320op/s; 111487/1488290
> degraded (7.491%)
> >>>>>
> >>>>>
> >>>>> I have tried to restart all OSD in the cluster, but does not help to
> finish the recovery of the cluster.
> >>>>>
> >>>>> Have someone any idea
> >>>>>
> >>>>> Kind Regards
> >>>>> Harald Rößler
> >>>>>
> >>>>>
> >>>>>
> >>>>> _______________________________________________
> >>>>> ceph-users mailing list
> >>>>> [email protected]
> >>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>>>>
> >>>>
> >>>>
> >>>> --
> >>>> Wido den Hollander
> >>>> Ceph consultant and trainer
> >>>> 42on B.V.
> >>>>
> >>>> Phone: +31 (0)20 700 9902
> >>>> Skype: contact42on
> >>>> _______________________________________________
> >>>> ceph-users mailing list
> >>>> [email protected]
> >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>>
> >>
> >>
> >> --
> >> Wido den Hollander
> >> Ceph consultant and trainer
> >> 42on B.V.
> >>
> >> Phone: +31 (0)20 700 9902
> >> Skype: contact42on
> >
>
>
> --
> Wido den Hollander
> Ceph consultant and trainer
> 42on B.V.
>
> Phone: +31 (0)20 700 9902
> Skype: contact42on
> _______________________________________________
> ceph-users mailing list
> [email protected]
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>


2014-10-20 17:12 GMT+02:00 Wido den Hollander <[email protected]>:

> On 10/20/2014 05:10 PM, Harald Rößler wrote:
> > yes, tomorrow I will get the replacement of the failed disk, to get a
> new node with many disk will take a few days.
> > No other idea?
> >
>
> If the disks are all full, then, no.
>
> Sorry to say this, but it came down to poor capacity management. Never
> let any disk in your cluster fill over 80% to prevent these situations.
>
> Wido
>
> > Harald Rößler
> >
> >
> >> Am 20.10.2014 um 16:45 schrieb Wido den Hollander <[email protected]>:
> >>
> >> On 10/20/2014 04:43 PM, Harald Rößler wrote:
> >>> Yes, I had some OSD which was near full, after that I tried to fix the
> problem with "ceph osd reweight-by-utilization", but this does not help.
> After that I set the near full ratio to 88% with the idea that the
> remapping would fix the issue. Also a restart of the OSD doesn’t help. At
> the same time I had a hardware failure of on disk. :-(. After that failure
> the recovery process start at "degraded ~ 13%“ and stops at 7%.
> >>> Honestly I am scared in the moment I am doing the wrong operation.
> >>>
> >>
> >> Any chance of adding a new node with some fresh disks? Seems like you
> >> are operating on the storage capacity limit of the nodes and that your
> >> only remedy would be adding more spindles.
> >>
> >> Wido
> >>
> >>> Regards
> >>> Harald Rößler
> >>>
> >>>
> >>>
> >>>> Am 20.10.2014 um 14:51 schrieb Wido den Hollander <[email protected]>:
> >>>>
> >>>> On 10/20/2014 02:45 PM, Harald Rößler wrote:
> >>>>> Dear All
> >>>>>
> >>>>> I have in them moment a issue with my cluster. The recovery process
> stops.
> >>>>>
> >>>>
> >>>> See this: 2 active+degraded+remapped+backfill_toofull
> >>>>
> >>>> 156 pgs backfill_toofull
> >>>>
> >>>> You have one or more OSDs which are to full and that causes recovery
> to
> >>>> stop.
> >>>>
> >>>> If you add more capacity to the cluster recovery will continue and
> finish.
> >>>>
> >>>>> ceph -s
> >>>>>  health HEALTH_WARN 188 pgs backfill; 156 pgs backfill_toofull; 4
> pgs backfilling; 55 pgs degraded; 49 pgs recovery_wait; 297 pgs stuck
> unclean; recovery 111487/1488290 degraded (7.491%)
> >>>>>  monmap e2: 3 mons at {0=
> 10.99.10.10:6789/0,12=10.99.10.22:6789/0,6=10.99.10.16:6789/0}, election
> epoch 332, quorum 0,1,2 0,12,6
> >>>>>  osdmap e6748: 24 osds: 23 up, 23 in
> >>>>>   pgmap v43314672: 3328 pgs: 3031 active+clean, 43
> active+remapped+wait_backfill, 3 active+degraded+wait_backfill, 96
> active+remapped+wait_backfill+backfill_toofull, 31 active+recovery_wait, 19
> active+degraded+wait_backfill+backfill_toofull, 36 active+remapped, 3
> active+remapped+backfilling, 18 active+remapped+backfill_toofull, 6
> active+degraded+remapped+wait_backfill, 15 active+recovery_wait+remapped,
> 21 active+degraded+remapped+wait_backfill+backfill_toofull, 1
> active+recovery_wait+degraded, 1 active+degraded+remapped+backfilling, 2
> active+degraded+remapped+backfill_toofull, 2
> active+recovery_wait+degraded+remapped; 1698 GB data, 5206 GB used, 971 GB
> / 6178 GB avail; 24382B/s rd, 12411KB/s wr, 320op/s; 111487/1488290
> degraded (7.491%)
> >>>>>
> >>>>>
> >>>>> I have tried to restart all OSD in the cluster, but does not help to
> finish the recovery of the cluster.
> >>>>>
> >>>>> Have someone any idea
> >>>>>
> >>>>> Kind Regards
> >>>>> Harald Rößler
> >>>>>
> >>>>>
> >>>>>
> >>>>> _______________________________________________
> >>>>> ceph-users mailing list
> >>>>> [email protected]
> >>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>>>>
> >>>>
> >>>>
> >>>> --
> >>>> Wido den Hollander
> >>>> Ceph consultant and trainer
> >>>> 42on B.V.
> >>>>
> >>>> Phone: +31 (0)20 700 9902
> >>>> Skype: contact42on
> >>>> _______________________________________________
> >>>> ceph-users mailing list
> >>>> [email protected]
> >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>>
> >>
> >>
> >> --
> >> Wido den Hollander
> >> Ceph consultant and trainer
> >> 42on B.V.
> >>
> >> Phone: +31 (0)20 700 9902
> >> Skype: contact42on
> >
>
>
> --
> Wido den Hollander
> Ceph consultant and trainer
> 42on B.V.
>
> Phone: +31 (0)20 700 9902
> Skype: contact42on
> _______________________________________________
> ceph-users mailing list
> [email protected]
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] recovery process stops

Reply via email to