You can set lower weight on full osds, or try changing the osd_near_full_ratio parameter in your cluster from 85 to for example 89. But i don't know what can go wrong when you do that.
2014-10-20 17:12 GMT+02:00 Wido den Hollander <[email protected]>: > On 10/20/2014 05:10 PM, Harald Rößler wrote: > > yes, tomorrow I will get the replacement of the failed disk, to get a > new node with many disk will take a few days. > > No other idea? > > > > If the disks are all full, then, no. > > Sorry to say this, but it came down to poor capacity management. Never > let any disk in your cluster fill over 80% to prevent these situations. > > Wido > > > Harald Rößler > > > > > >> Am 20.10.2014 um 16:45 schrieb Wido den Hollander <[email protected]>: > >> > >> On 10/20/2014 04:43 PM, Harald Rößler wrote: > >>> Yes, I had some OSD which was near full, after that I tried to fix the > problem with "ceph osd reweight-by-utilization", but this does not help. > After that I set the near full ratio to 88% with the idea that the > remapping would fix the issue. Also a restart of the OSD doesn’t help. At > the same time I had a hardware failure of on disk. :-(. After that failure > the recovery process start at "degraded ~ 13%“ and stops at 7%. > >>> Honestly I am scared in the moment I am doing the wrong operation. > >>> > >> > >> Any chance of adding a new node with some fresh disks? Seems like you > >> are operating on the storage capacity limit of the nodes and that your > >> only remedy would be adding more spindles. > >> > >> Wido > >> > >>> Regards > >>> Harald Rößler > >>> > >>> > >>> > >>>> Am 20.10.2014 um 14:51 schrieb Wido den Hollander <[email protected]>: > >>>> > >>>> On 10/20/2014 02:45 PM, Harald Rößler wrote: > >>>>> Dear All > >>>>> > >>>>> I have in them moment a issue with my cluster. The recovery process > stops. > >>>>> > >>>> > >>>> See this: 2 active+degraded+remapped+backfill_toofull > >>>> > >>>> 156 pgs backfill_toofull > >>>> > >>>> You have one or more OSDs which are to full and that causes recovery > to > >>>> stop. > >>>> > >>>> If you add more capacity to the cluster recovery will continue and > finish. > >>>> > >>>>> ceph -s > >>>>> health HEALTH_WARN 188 pgs backfill; 156 pgs backfill_toofull; 4 > pgs backfilling; 55 pgs degraded; 49 pgs recovery_wait; 297 pgs stuck > unclean; recovery 111487/1488290 degraded (7.491%) > >>>>> monmap e2: 3 mons at {0= > 10.99.10.10:6789/0,12=10.99.10.22:6789/0,6=10.99.10.16:6789/0}, election > epoch 332, quorum 0,1,2 0,12,6 > >>>>> osdmap e6748: 24 osds: 23 up, 23 in > >>>>> pgmap v43314672: 3328 pgs: 3031 active+clean, 43 > active+remapped+wait_backfill, 3 active+degraded+wait_backfill, 96 > active+remapped+wait_backfill+backfill_toofull, 31 active+recovery_wait, 19 > active+degraded+wait_backfill+backfill_toofull, 36 active+remapped, 3 > active+remapped+backfilling, 18 active+remapped+backfill_toofull, 6 > active+degraded+remapped+wait_backfill, 15 active+recovery_wait+remapped, > 21 active+degraded+remapped+wait_backfill+backfill_toofull, 1 > active+recovery_wait+degraded, 1 active+degraded+remapped+backfilling, 2 > active+degraded+remapped+backfill_toofull, 2 > active+recovery_wait+degraded+remapped; 1698 GB data, 5206 GB used, 971 GB > / 6178 GB avail; 24382B/s rd, 12411KB/s wr, 320op/s; 111487/1488290 > degraded (7.491%) > >>>>> > >>>>> > >>>>> I have tried to restart all OSD in the cluster, but does not help to > finish the recovery of the cluster. > >>>>> > >>>>> Have someone any idea > >>>>> > >>>>> Kind Regards > >>>>> Harald Rößler > >>>>> > >>>>> > >>>>> > >>>>> _______________________________________________ > >>>>> ceph-users mailing list > >>>>> [email protected] > >>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >>>>> > >>>> > >>>> > >>>> -- > >>>> Wido den Hollander > >>>> Ceph consultant and trainer > >>>> 42on B.V. > >>>> > >>>> Phone: +31 (0)20 700 9902 > >>>> Skype: contact42on > >>>> _______________________________________________ > >>>> ceph-users mailing list > >>>> [email protected] > >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >>> > >> > >> > >> -- > >> Wido den Hollander > >> Ceph consultant and trainer > >> 42on B.V. > >> > >> Phone: +31 (0)20 700 9902 > >> Skype: contact42on > > > > > -- > Wido den Hollander > Ceph consultant and trainer > 42on B.V. > > Phone: +31 (0)20 700 9902 > Skype: contact42on > _______________________________________________ > ceph-users mailing list > [email protected] > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > 2014-10-20 17:12 GMT+02:00 Wido den Hollander <[email protected]>: > On 10/20/2014 05:10 PM, Harald Rößler wrote: > > yes, tomorrow I will get the replacement of the failed disk, to get a > new node with many disk will take a few days. > > No other idea? > > > > If the disks are all full, then, no. > > Sorry to say this, but it came down to poor capacity management. Never > let any disk in your cluster fill over 80% to prevent these situations. > > Wido > > > Harald Rößler > > > > > >> Am 20.10.2014 um 16:45 schrieb Wido den Hollander <[email protected]>: > >> > >> On 10/20/2014 04:43 PM, Harald Rößler wrote: > >>> Yes, I had some OSD which was near full, after that I tried to fix the > problem with "ceph osd reweight-by-utilization", but this does not help. > After that I set the near full ratio to 88% with the idea that the > remapping would fix the issue. Also a restart of the OSD doesn’t help. At > the same time I had a hardware failure of on disk. :-(. After that failure > the recovery process start at "degraded ~ 13%“ and stops at 7%. > >>> Honestly I am scared in the moment I am doing the wrong operation. > >>> > >> > >> Any chance of adding a new node with some fresh disks? Seems like you > >> are operating on the storage capacity limit of the nodes and that your > >> only remedy would be adding more spindles. > >> > >> Wido > >> > >>> Regards > >>> Harald Rößler > >>> > >>> > >>> > >>>> Am 20.10.2014 um 14:51 schrieb Wido den Hollander <[email protected]>: > >>>> > >>>> On 10/20/2014 02:45 PM, Harald Rößler wrote: > >>>>> Dear All > >>>>> > >>>>> I have in them moment a issue with my cluster. The recovery process > stops. > >>>>> > >>>> > >>>> See this: 2 active+degraded+remapped+backfill_toofull > >>>> > >>>> 156 pgs backfill_toofull > >>>> > >>>> You have one or more OSDs which are to full and that causes recovery > to > >>>> stop. > >>>> > >>>> If you add more capacity to the cluster recovery will continue and > finish. > >>>> > >>>>> ceph -s > >>>>> health HEALTH_WARN 188 pgs backfill; 156 pgs backfill_toofull; 4 > pgs backfilling; 55 pgs degraded; 49 pgs recovery_wait; 297 pgs stuck > unclean; recovery 111487/1488290 degraded (7.491%) > >>>>> monmap e2: 3 mons at {0= > 10.99.10.10:6789/0,12=10.99.10.22:6789/0,6=10.99.10.16:6789/0}, election > epoch 332, quorum 0,1,2 0,12,6 > >>>>> osdmap e6748: 24 osds: 23 up, 23 in > >>>>> pgmap v43314672: 3328 pgs: 3031 active+clean, 43 > active+remapped+wait_backfill, 3 active+degraded+wait_backfill, 96 > active+remapped+wait_backfill+backfill_toofull, 31 active+recovery_wait, 19 > active+degraded+wait_backfill+backfill_toofull, 36 active+remapped, 3 > active+remapped+backfilling, 18 active+remapped+backfill_toofull, 6 > active+degraded+remapped+wait_backfill, 15 active+recovery_wait+remapped, > 21 active+degraded+remapped+wait_backfill+backfill_toofull, 1 > active+recovery_wait+degraded, 1 active+degraded+remapped+backfilling, 2 > active+degraded+remapped+backfill_toofull, 2 > active+recovery_wait+degraded+remapped; 1698 GB data, 5206 GB used, 971 GB > / 6178 GB avail; 24382B/s rd, 12411KB/s wr, 320op/s; 111487/1488290 > degraded (7.491%) > >>>>> > >>>>> > >>>>> I have tried to restart all OSD in the cluster, but does not help to > finish the recovery of the cluster. > >>>>> > >>>>> Have someone any idea > >>>>> > >>>>> Kind Regards > >>>>> Harald Rößler > >>>>> > >>>>> > >>>>> > >>>>> _______________________________________________ > >>>>> ceph-users mailing list > >>>>> [email protected] > >>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >>>>> > >>>> > >>>> > >>>> -- > >>>> Wido den Hollander > >>>> Ceph consultant and trainer > >>>> 42on B.V. > >>>> > >>>> Phone: +31 (0)20 700 9902 > >>>> Skype: contact42on > >>>> _______________________________________________ > >>>> ceph-users mailing list > >>>> [email protected] > >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >>> > >> > >> > >> -- > >> Wido den Hollander > >> Ceph consultant and trainer > >> 42on B.V. > >> > >> Phone: +31 (0)20 700 9902 > >> Skype: contact42on > > > > > -- > Wido den Hollander > Ceph consultant and trainer > 42on B.V. > > Phone: +31 (0)20 700 9902 > Skype: contact42on > _______________________________________________ > ceph-users mailing list > [email protected] > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
