Re: [ceph-users] recovery process stops

Wido den Hollander Thu, 23 Oct 2014 10:02:11 -0700

On 10/23/2014 05:33 PM, Harald Rößler wrote:
> Hi all
> 
> the procedure does not work for me, have still 47 active+remapped pg. Anyone 
> have an idea how to fix this issue.


If you look at those PGs using "ceph osd pg dump", what is their prefix?

They should start with a number and that number corresponds back to a
pool ID which you can see with "ceph osd dump|grep pool"

Could it be that that specific pool is using a special crush rule?

Wido

> @Wido: now my cluster have a usage less than 80% - thanks for your advice.
> 
> Harry
> 
> 
> Am 21.10.2014 um 22:38 schrieb Craig Lewis 
> <cle...@centraldesktop.com<mailto:cle...@centraldesktop.com>>:
> 
> In that case, take a look at ceph pg dump | grep remapped.  In the up or 
> active column, there should be one or two common OSDs between the stuck PGs.
> 
> Try restarting those OSD daemons.  I've had a few OSDs get stuck scheduling 
> recovery, particularly around toofull situations.
> 
> I've also had Robert's experience of stuck operations becoming unstuck over 
> night.
> 
> 
> On Tue, Oct 21, 2014 at 12:02 PM, Harald Rößler 
> <harald.roess...@btd.de<mailto:harald.roess...@btd.de>> wrote:
> After more than 10 hours the same situation, I don’t think it will fix self 
> over time. How I can find out what is the problem.
> 
> 
> Am 21.10.2014 um 17:28 schrieb Craig Lewis 
> <cle...@centraldesktop.com<mailto:cle...@centraldesktop.com>>:
> 
> That will fix itself over time.  remapped just means that Ceph is moving the 
> data around.  It's normal to see PGs in the remapped and/or backfilling state 
> after OSD restarts.
> 
> They should go down steadily over time.  How long depends on how much data is 
> in the PGs, how fast your hardware is, how many OSDs are affected, and how 
> much you allow recovery to impact cluster performance.  Mine currently take 
> about 20 minutes per PG.  If all 47 are on the same OSD, it'll be a while.  
> If they're evenly split between multiple OSDs, parallelism will speed that up.
> 
> On Tue, Oct 21, 2014 at 1:22 AM, Harald Rößler 
> <harald.roess...@btd.de<mailto:harald.roess...@btd.de>> wrote:
> Hi all,
> 
> thank you for your support, now the file system is not degraded any more. Now 
> I have a minus degrading :-)
> 
> 2014-10-21 10:15:22.303139 mon.0 [INF] pgmap v43376478: 3328 pgs: 3281 
> active+clean, 47 active+remapped; 1609 GB data, 5022 GB used, 1155 GB / 6178 
> GB avail; 8034B/s rd, 3548KB/s wr, 161op/s; -1638/1329293 degraded (-0.123%)
> 
> but ceph reports me a health HEALTH_WARN 47 pgs stuck unclean; recovery 
> -1638/1329293 degraded (-0.123%)
> 
> I think this warning is reported because there are 47 active+remapped 
> objects, some ideas how to fix that now?
> 
> Kind Regards
> Harald Roessler
> 
> 
> Am 21.10.2014 um 01:03 schrieb Craig Lewis 
> <cle...@centraldesktop.com<mailto:cle...@centraldesktop.com>>:
> 
> I've been in a state where reweight-by-utilization was deadlocked (not the 
> daemons, but the remap scheduling).  After successive osd reweight commands, 
> two OSDs wanted to swap PGs, but they were both toofull.  I ended up 
> temporarily increasing mon_osd_nearfull_ratio to 0.87.  That removed the 
> impediment, and everything finished remapping.  Everything went smoothly, and 
> I changed it back when all the remapping finished.
> 
> Just be careful if you need to get close to mon_osd_full_ratio.  Ceph does 
> greater-than on these percentages, not greater-than-equal.  You really don't 
> want the disks to get greater-than mon_osd_full_ratio, because all external 
> IO will stop until you resolve that.
> 
> 
> On Mon, Oct 20, 2014 at 10:18 AM, Leszek Master 
> <keks...@gmail.com<mailto:keks...@gmail.com>> wrote:
> You can set lower weight on full osds, or try changing the 
> osd_near_full_ratio parameter in your cluster from 85 to for example 89. But 
> i don't know what can go wrong when you do that.
> 
> 
> 2014-10-20 17:12 GMT+02:00 Wido den Hollander 
> <w...@42on.com<mailto:w...@42on.com>>:
> On 10/20/2014 05:10 PM, Harald Rößler wrote:
>> yes, tomorrow I will get the replacement of the failed disk, to get a new 
>> node with many disk will take a few days.
>> No other idea?
>>
> 
> If the disks are all full, then, no.
> 
> Sorry to say this, but it came down to poor capacity management. Never
> let any disk in your cluster fill over 80% to prevent these situations.
> 
> Wido
> 
>> Harald Rößler
>>
>>
>>> Am 20.10.2014 um 16:45 schrieb Wido den Hollander 
>>> <w...@42on.com<mailto:w...@42on.com>>:
>>>
>>> On 10/20/2014 04:43 PM, Harald Rößler wrote:
>>>> Yes, I had some OSD which was near full, after that I tried to fix the 
>>>> problem with "ceph osd reweight-by-utilization", but this does not help. 
>>>> After that I set the near full ratio to 88% with the idea that the 
>>>> remapping would fix the issue. Also a restart of the OSD doesn’t help. At 
>>>> the same time I had a hardware failure of on disk. :-(. After that failure 
>>>> the recovery process start at "degraded ~ 13%“ and stops at 7%.
>>>> Honestly I am scared in the moment I am doing the wrong operation.
>>>>
>>>
>>> Any chance of adding a new node with some fresh disks? Seems like you
>>> are operating on the storage capacity limit of the nodes and that your
>>> only remedy would be adding more spindles.
>>>
>>> Wido
>>>
>>>> Regards
>>>> Harald Rößler
>>>>
>>>>
>>>>
>>>>> Am 20.10.2014 um 14:51 schrieb Wido den Hollander 
>>>>> <w...@42on.com<mailto:w...@42on.com>>:
>>>>>
>>>>> On 10/20/2014 02:45 PM, Harald Rößler wrote:
>>>>>> Dear All
>>>>>>
>>>>>> I have in them moment a issue with my cluster. The recovery process 
>>>>>> stops.
>>>>>>
>>>>>
>>>>> See this: 2 active+degraded+remapped+backfill_toofull
>>>>>
>>>>> 156 pgs backfill_toofull
>>>>>
>>>>> You have one or more OSDs which are to full and that causes recovery to
>>>>> stop.
>>>>>
>>>>> If you add more capacity to the cluster recovery will continue and finish.
>>>>>
>>>>>> ceph -s
>>>>>>  health HEALTH_WARN 188 pgs backfill; 156 pgs backfill_toofull; 4 pgs 
>>>>>> backfilling; 55 pgs degraded; 49 pgs recovery_wait; 297 pgs stuck 
>>>>>> unclean; recovery 111487/1488290 degraded (7.491%)
>>>>>>  monmap e2: 3 mons at 
>>>>>> {0=10.99.10.10:6789/0,12=10.99.10.22:6789/0,6=10.99.10.16:6789/0<http://10.99.10.10:6789/0,12=10.99.10.22:6789/0,6=10.99.10.16:6789/0>},
>>>>>>  election epoch 332, quorum 0,1,2 0,12,6
>>>>>>  osdmap e6748: 24 osds: 23 up, 23 in
>>>>>>   pgmap v43314672: 3328 pgs: 3031 active+clean, 43 
>>>>>> active+remapped+wait_backfill, 3 active+degraded+wait_backfill, 96 
>>>>>> active+remapped+wait_backfill+backfill_toofull, 31 active+recovery_wait, 
>>>>>> 19 active+degraded+wait_backfill+backfill_toofull, 36 active+remapped, 3 
>>>>>> active+remapped+backfilling, 18 active+remapped+backfill_toofull, 6 
>>>>>> active+degraded+remapped+wait_backfill, 15 
>>>>>> active+recovery_wait+remapped, 21 
>>>>>> active+degraded+remapped+wait_backfill+backfill_toofull, 1 
>>>>>> active+recovery_wait+degraded, 1 active+degraded+remapped+backfilling, 2 
>>>>>> active+degraded+remapped+backfill_toofull, 2 
>>>>>> active+recovery_wait+degraded+remapped; 1698 GB data, 5206 GB used, 971 
>>>>>> GB / 6178 GB avail; 24382B/s rd, 12411KB/s wr, 320op/s; 111487/1488290 
>>>>>> degraded (7.491%)
>>>>>>
>>>>>>
>>>>>> I have tried to restart all OSD in the cluster, but does not help to 
>>>>>> finish the recovery of the cluster.
>>>>>>
>>>>>> Have someone any idea
>>>>>>
>>>>>> Kind Regards
>>>>>> Harald Rößler
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> ceph-users mailing list
>>>>>> ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Wido den Hollander
>>>>> Ceph consultant and trainer
>>>>> 42on B.V.
>>>>>
>>>>> Phone: +31 (0)20 700 9902<tel:%2B31%20%280%2920%20700%209902>
>>>>> Skype: contact42on
>>>>> _______________________________________________
>>>>> ceph-users mailing list
>>>>> ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>
>>>
>>>
>>> --
>>> Wido den Hollander
>>> Ceph consultant and trainer
>>> 42on B.V.
>>>
>>> Phone: +31 (0)20 700 9902<tel:%2B31%20%280%2920%20700%209902>
>>> Skype: contact42on
>>
> 
> 
> --
> Wido den Hollander
> Ceph consultant and trainer
> 42on B.V.
> 
> Phone: +31 (0)20 700 9902<tel:%2B31%20%280%2920%20700%209902>
> Skype: contact42on
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
> 2014-10-20 17:12 GMT+02:00 Wido den Hollander 
> <w...@42on.com<mailto:w...@42on.com>>:
> On 10/20/2014 05:10 PM, Harald Rößler wrote:
>> yes, tomorrow I will get the replacement of the failed disk, to get a new 
>> node with many disk will take a few days.
>> No other idea?
>>
> 
> If the disks are all full, then, no.
> 
> Sorry to say this, but it came down to poor capacity management. Never
> let any disk in your cluster fill over 80% to prevent these situations.
> 
> Wido
> 
>> Harald Rößler
>>
>>
>>> Am 20.10.2014 um 16:45 schrieb Wido den Hollander 
>>> <w...@42on.com<mailto:w...@42on.com>>:
>>>
>>> On 10/20/2014 04:43 PM, Harald Rößler wrote:
>>>> Yes, I had some OSD which was near full, after that I tried to fix the 
>>>> problem with "ceph osd reweight-by-utilization", but this does not help. 
>>>> After that I set the near full ratio to 88% with the idea that the 
>>>> remapping would fix the issue. Also a restart of the OSD doesn’t help. At 
>>>> the same time I had a hardware failure of on disk. :-(. After that failure 
>>>> the recovery process start at "degraded ~ 13%“ and stops at 7%.
>>>> Honestly I am scared in the moment I am doing the wrong operation.
>>>>
>>>
>>> Any chance of adding a new node with some fresh disks? Seems like you
>>> are operating on the storage capacity limit of the nodes and that your
>>> only remedy would be adding more spindles.
>>>
>>> Wido
>>>
>>>> Regards
>>>> Harald Rößler
>>>>
>>>>
>>>>
>>>>> Am 20.10.2014 um 14:51 schrieb Wido den Hollander 
>>>>> <w...@42on.com<mailto:w...@42on.com>>:
>>>>>
>>>>> On 10/20/2014 02:45 PM, Harald Rößler wrote:
>>>>>> Dear All
>>>>>>
>>>>>> I have in them moment a issue with my cluster. The recovery process 
>>>>>> stops.
>>>>>>
>>>>>
>>>>> See this: 2 active+degraded+remapped+backfill_toofull
>>>>>
>>>>> 156 pgs backfill_toofull
>>>>>
>>>>> You have one or more OSDs which are to full and that causes recovery to
>>>>> stop.
>>>>>
>>>>> If you add more capacity to the cluster recovery will continue and finish.
>>>>>
>>>>>> ceph -s
>>>>>>  health HEALTH_WARN 188 pgs backfill; 156 pgs backfill_toofull; 4 pgs 
>>>>>> backfilling; 55 pgs degraded; 49 pgs recovery_wait; 297 pgs stuck 
>>>>>> unclean; recovery 111487/1488290 degraded (7.491%)
>>>>>>  monmap e2: 3 mons at 
>>>>>> {0=10.99.10.10:6789/0,12=10.99.10.22:6789/0,6=10.99.10.16:6789/0<http://10.99.10.10:6789/0,12=10.99.10.22:6789/0,6=10.99.10.16:6789/0>},
>>>>>>  election epoch 332, quorum 0,1,2 0,12,6
>>>>>>  osdmap e6748: 24 osds: 23 up, 23 in
>>>>>>   pgmap v43314672: 3328 pgs: 3031 active+clean, 43 
>>>>>> active+remapped+wait_backfill, 3 active+degraded+wait_backfill, 96 
>>>>>> active+remapped+wait_backfill+backfill_toofull, 31 active+recovery_wait, 
>>>>>> 19 active+degraded+wait_backfill+backfill_toofull, 36 active+remapped, 3 
>>>>>> active+remapped+backfilling, 18 active+remapped+backfill_toofull, 6 
>>>>>> active+degraded+remapped+wait_backfill, 15 
>>>>>> active+recovery_wait+remapped, 21 
>>>>>> active+degraded+remapped+wait_backfill+backfill_toofull, 1 
>>>>>> active+recovery_wait+degraded, 1 active+degraded+remapped+backfilling, 2 
>>>>>> active+degraded+remapped+backfill_toofull, 2 
>>>>>> active+recovery_wait+degraded+remapped; 1698 GB data, 5206 GB used, 971 
>>>>>> GB / 6178 GB avail; 24382B/s rd, 12411KB/s wr, 320op/s; 111487/1488290 
>>>>>> degraded (7.491%)
>>>>>>
>>>>>>
>>>>>> I have tried to restart all OSD in the cluster, but does not help to 
>>>>>> finish the recovery of the cluster.
>>>>>>
>>>>>> Have someone any idea
>>>>>>
>>>>>> Kind Regards
>>>>>> Harald Rößler
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> ceph-users mailing list
>>>>>> ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Wido den Hollander
>>>>> Ceph consultant and trainer
>>>>> 42on B.V.
>>>>>
>>>>> Phone: +31 (0)20 700 9902<tel:%2B31%20%280%2920%20700%209902>
>>>>> Skype: contact42on
>>>>> _______________________________________________
>>>>> ceph-users mailing list
>>>>> ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>
>>>
>>>
>>> --
>>> Wido den Hollander
>>> Ceph consultant and trainer
>>> 42on B.V.
>>>
>>> Phone: +31 (0)20 700 9902<tel:%2B31%20%280%2920%20700%209902>
>>> Skype: contact42on
>>
> 
> 
> --
> Wido den Hollander
> Ceph consultant and trainer
> 42on B.V.
> 
> Phone: +31 (0)20 700 9902<tel:%2B31%20%280%2920%20700%209902>
> Skype: contact42on
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
> 
> 
> 


-- 
Wido den Hollander
42on B.V.
Ceph trainer and consultant

Phone: +31 (0)20 700 9902
Skype: contact42on
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] recovery process stops

Reply via email to