Anyone an idea to solver the situation? Thanks for any advise. Kind Regards Harald Rößler
> Am 23.10.2014 um 18:56 schrieb Harald Rößler <harald.roess...@btd.de>: > > @Wido: sorry I don’t understand what you mean 100%, generated some output > which may helps. > > > Ok the pool: > > pool 3 'bcf' rep size 3 min_size 1 crush_ruleset 0 object_hash rjenkins > pg_num 832 pgp_num 832 last_change 8000 owner 0 > > > all remapping pg have an temp entry: > > pg_temp 3.1 [14,20,0] > pg_temp 3.c [1,7,23] > pg_temp 3.22 [15,21,23] > > > > 3.22 429 0 2 0 1654296576 0 0 > active+remapped 2014-10-23 03:25:03.180505 8608'363836897 > 8608'377970131 [15,21] [15,21,23] 3578'354650024 2014-10-16 > 04:06:39.133104 3578'354650024 2014-10-16 04:06:39.133104 > > the crush rules. > > # rules > rule data { > ruleset 0 > type replicated > min_size 1 > max_size 10 > step take default > step chooseleaf firstn 0 type host > step emit > } > rule metadata { > ruleset 1 > type replicated > min_size 1 > max_size 10 > step take default > step chooseleaf firstn 0 type host > step emit > } > rule rbd { > ruleset 2 > type replicated > min_size 1 > max_size 10 > step take default > step chooseleaf firstn 0 type host > step emit > } > > > ceph pg 3.22 query > > > > > { "state": "active+remapped", > "epoch": 8608, > "up": [ > 15, > 21], > "acting": [ > 15, > 21, > 23], > "info": { "pgid": "3.22", > "last_update": "8608'363845313", > "last_complete": "8608'363845313", > "log_tail": "8608'363842312", > "last_backfill": "MAX", > "purged_snaps": "[1~1,3~3,8~6,f~31,42~1,44~3,48~f,58~1,5a~2]", > "history": { "epoch_created": 140, > "last_epoch_started": 8576, > "last_epoch_clean": 8576, > "last_epoch_split": 0, > "same_up_since": 8340, > "same_interval_since": 8575, > "same_primary_since": 7446, > "last_scrub": "3578'354650024", > "last_scrub_stamp": "2014-10-16 04:06:39.133104", > "last_deep_scrub": "3578'354650024", > "last_deep_scrub_stamp": "2014-10-16 04:06:39.133104", > "last_clean_scrub_stamp": "2014-10-16 04:06:39.133104"}, > "stats": { "version": "8608'363845313", > "reported": "8608'377978685", > "state": "active+remapped", > "last_fresh": "2014-10-23 18:55:07.582844", > "last_change": "2014-10-23 03:25:03.180505", > "last_active": "2014-10-23 18:55:07.582844", > "last_clean": "2014-10-20 07:51:21.330669", > "last_became_active": "2013-07-14 07:20:30.173508", > "last_unstale": "2014-10-23 18:55:07.582844", > "mapping_epoch": 8370, > "log_start": "8608'363842312", > "ondisk_log_start": "8608'363842312", > "created": 140, > "last_epoch_clean": 8576, > "parent": "0.0", > "parent_split_bits": 0, > "last_scrub": "3578'354650024", > "last_scrub_stamp": "2014-10-16 04:06:39.133104", > "last_deep_scrub": "3578'354650024", > "last_deep_scrub_stamp": "2014-10-16 04:06:39.133104", > "last_clean_scrub_stamp": "2014-10-16 04:06:39.133104", > "log_size": 0, > "ondisk_log_size": 0, > "stats_invalid": "0", > "stat_sum": { "num_bytes": 1654296576, > "num_objects": 429, > "num_object_clones": 28, > "num_object_copies": 0, > "num_objects_missing_on_primary": 0, > "num_objects_degraded": 0, > "num_objects_unfound": 0, > "num_read": 8053865, > "num_read_kb": 124022900, > "num_write": 363844886, > "num_write_kb": 2083536824, > "num_scrub_errors": 0, > "num_shallow_scrub_errors": 0, > "num_deep_scrub_errors": 0, > "num_objects_recovered": 2777, > "num_bytes_recovered": 11138282496, > "num_keys_recovered": 0}, > "stat_cat_sum": {}, > "up": [ > 15, > 21], > "acting": [ > 15, > 21, > 23]}, > "empty": 0, > "dne": 0, > "incomplete": 0, > "last_epoch_started": 8576}, > "recovery_state": [ > { "name": "Started\/Primary\/Active", > "enter_time": "2014-10-23 03:25:03.179759", > "might_have_unfound": [], > "recovery_progress": { "backfill_target": -1, > "waiting_on_backfill": 0, > "backfill_pos": "0\/\/0\/\/-1", > "backfill_info": { "begin": "0\/\/0\/\/-1", > "end": "0\/\/0\/\/-1", > "objects": []}, > "peer_backfill_info": { "begin": "0\/\/0\/\/-1", > "end": "0\/\/0\/\/-1", > "objects": []}, > "backfills_in_flight": [], > "pull_from_peer": [], > "pushing": []}, > "scrub": { "scrubber.epoch_start": "0", > "scrubber.active": 0, > "scrubber.block_writes": 0, > "scrubber.finalizing": 0, > "scrubber.waiting_on": 0, > "scrubber.waiting_on_whom": []}}, > { "name": "Started", > "enter_time": "2014-10-23 03:25:02.174216"}]} > > >> Am 23.10.2014 um 17:36 schrieb Wido den Hollander <w...@42on.com>: >> >> On 10/23/2014 05:33 PM, Harald Rößler wrote: >>> Hi all >>> >>> the procedure does not work for me, have still 47 active+remapped pg. >>> Anyone have an idea how to fix this issue. >> >> If you look at those PGs using "ceph osd pg dump", what is their prefix? >> >> They should start with a number and that number corresponds back to a >> pool ID which you can see with "ceph osd dump|grep pool" >> >> Could it be that that specific pool is using a special crush rule? >> >> Wido >> >>> @Wido: now my cluster have a usage less than 80% - thanks for your advice. >>> >>> Harry >>> >>> >>> Am 21.10.2014 um 22:38 schrieb Craig Lewis >>> <cle...@centraldesktop.com<mailto:cle...@centraldesktop.com>>: >>> >>> In that case, take a look at ceph pg dump | grep remapped. In the up or >>> active column, there should be one or two common OSDs between the stuck PGs. >>> >>> Try restarting those OSD daemons. I've had a few OSDs get stuck scheduling >>> recovery, particularly around toofull situations. >>> >>> I've also had Robert's experience of stuck operations becoming unstuck over >>> night. >>> >>> >>> On Tue, Oct 21, 2014 at 12:02 PM, Harald Rößler >>> <harald.roess...@btd.de<mailto:harald.roess...@btd.de>> wrote: >>> After more than 10 hours the same situation, I don’t think it will fix self >>> over time. How I can find out what is the problem. >>> >>> >>> Am 21.10.2014 um 17:28 schrieb Craig Lewis >>> <cle...@centraldesktop.com<mailto:cle...@centraldesktop.com>>: >>> >>> That will fix itself over time. remapped just means that Ceph is moving >>> the data around. It's normal to see PGs in the remapped and/or backfilling >>> state after OSD restarts. >>> >>> They should go down steadily over time. How long depends on how much data >>> is in the PGs, how fast your hardware is, how many OSDs are affected, and >>> how much you allow recovery to impact cluster performance. Mine currently >>> take about 20 minutes per PG. If all 47 are on the same OSD, it'll be a >>> while. If they're evenly split between multiple OSDs, parallelism will >>> speed that up. >>> >>> On Tue, Oct 21, 2014 at 1:22 AM, Harald Rößler >>> <harald.roess...@btd.de<mailto:harald.roess...@btd.de>> wrote: >>> Hi all, >>> >>> thank you for your support, now the file system is not degraded any more. >>> Now I have a minus degrading :-) >>> >>> 2014-10-21 10:15:22.303139 mon.0 [INF] pgmap v43376478: 3328 pgs: 3281 >>> active+clean, 47 active+remapped; 1609 GB data, 5022 GB used, 1155 GB / >>> 6178 GB avail; 8034B/s rd, 3548KB/s wr, 161op/s; -1638/1329293 degraded >>> (-0.123%) >>> >>> but ceph reports me a health HEALTH_WARN 47 pgs stuck unclean; recovery >>> -1638/1329293 degraded (-0.123%) >>> >>> I think this warning is reported because there are 47 active+remapped >>> objects, some ideas how to fix that now? >>> >>> Kind Regards >>> Harald Roessler >>> >>> >>> Am 21.10.2014 um 01:03 schrieb Craig Lewis >>> <cle...@centraldesktop.com<mailto:cle...@centraldesktop.com>>: >>> >>> I've been in a state where reweight-by-utilization was deadlocked (not the >>> daemons, but the remap scheduling). After successive osd reweight >>> commands, two OSDs wanted to swap PGs, but they were both toofull. I ended >>> up temporarily increasing mon_osd_nearfull_ratio to 0.87. That removed the >>> impediment, and everything finished remapping. Everything went smoothly, >>> and I changed it back when all the remapping finished. >>> >>> Just be careful if you need to get close to mon_osd_full_ratio. Ceph does >>> greater-than on these percentages, not greater-than-equal. You really >>> don't want the disks to get greater-than mon_osd_full_ratio, because all >>> external IO will stop until you resolve that. >>> >>> >>> On Mon, Oct 20, 2014 at 10:18 AM, Leszek Master >>> <keks...@gmail.com<mailto:keks...@gmail.com>> wrote: >>> You can set lower weight on full osds, or try changing the >>> osd_near_full_ratio parameter in your cluster from 85 to for example 89. >>> But i don't know what can go wrong when you do that. >>> >>> >>> 2014-10-20 17:12 GMT+02:00 Wido den Hollander >>> <w...@42on.com<mailto:w...@42on.com>>: >>> On 10/20/2014 05:10 PM, Harald Rößler wrote: >>>> yes, tomorrow I will get the replacement of the failed disk, to get a new >>>> node with many disk will take a few days. >>>> No other idea? >>>> >>> >>> If the disks are all full, then, no. >>> >>> Sorry to say this, but it came down to poor capacity management. Never >>> let any disk in your cluster fill over 80% to prevent these situations. >>> >>> Wido >>> >>>> Harald Rößler >>>> >>>> >>>>> Am 20.10.2014 um 16:45 schrieb Wido den Hollander >>>>> <w...@42on.com<mailto:w...@42on.com>>: >>>>> >>>>> On 10/20/2014 04:43 PM, Harald Rößler wrote: >>>>>> Yes, I had some OSD which was near full, after that I tried to fix the >>>>>> problem with "ceph osd reweight-by-utilization", but this does not help. >>>>>> After that I set the near full ratio to 88% with the idea that the >>>>>> remapping would fix the issue. Also a restart of the OSD doesn’t help. >>>>>> At the same time I had a hardware failure of on disk. :-(. After that >>>>>> failure the recovery process start at "degraded ~ 13%“ and stops at 7%. >>>>>> Honestly I am scared in the moment I am doing the wrong operation. >>>>>> >>>>> >>>>> Any chance of adding a new node with some fresh disks? Seems like you >>>>> are operating on the storage capacity limit of the nodes and that your >>>>> only remedy would be adding more spindles. >>>>> >>>>> Wido >>>>> >>>>>> Regards >>>>>> Harald Rößler >>>>>> >>>>>> >>>>>> >>>>>>> Am 20.10.2014 um 14:51 schrieb Wido den Hollander >>>>>>> <w...@42on.com<mailto:w...@42on.com>>: >>>>>>> >>>>>>> On 10/20/2014 02:45 PM, Harald Rößler wrote: >>>>>>>> Dear All >>>>>>>> >>>>>>>> I have in them moment a issue with my cluster. The recovery process >>>>>>>> stops. >>>>>>>> >>>>>>> >>>>>>> See this: 2 active+degraded+remapped+backfill_toofull >>>>>>> >>>>>>> 156 pgs backfill_toofull >>>>>>> >>>>>>> You have one or more OSDs which are to full and that causes recovery to >>>>>>> stop. >>>>>>> >>>>>>> If you add more capacity to the cluster recovery will continue and >>>>>>> finish. >>>>>>> >>>>>>>> ceph -s >>>>>>>> health HEALTH_WARN 188 pgs backfill; 156 pgs backfill_toofull; 4 pgs >>>>>>>> backfilling; 55 pgs degraded; 49 pgs recovery_wait; 297 pgs stuck >>>>>>>> unclean; recovery 111487/1488290 degraded (7.491%) >>>>>>>> monmap e2: 3 mons at >>>>>>>> {0=10.99.10.10:6789/0,12=10.99.10.22:6789/0,6=10.99.10.16:6789/0<http://10.99.10.10:6789/0,12=10.99.10.22:6789/0,6=10.99.10.16:6789/0>}, >>>>>>>> election epoch 332, quorum 0,1,2 0,12,6 >>>>>>>> osdmap e6748: 24 osds: 23 up, 23 in >>>>>>>> pgmap v43314672: 3328 pgs: 3031 active+clean, 43 >>>>>>>> active+remapped+wait_backfill, 3 active+degraded+wait_backfill, 96 >>>>>>>> active+remapped+wait_backfill+backfill_toofull, 31 >>>>>>>> active+recovery_wait, 19 >>>>>>>> active+degraded+wait_backfill+backfill_toofull, 36 active+remapped, 3 >>>>>>>> active+remapped+backfilling, 18 active+remapped+backfill_toofull, 6 >>>>>>>> active+degraded+remapped+wait_backfill, 15 >>>>>>>> active+recovery_wait+remapped, 21 >>>>>>>> active+degraded+remapped+wait_backfill+backfill_toofull, 1 >>>>>>>> active+recovery_wait+degraded, 1 active+degraded+remapped+backfilling, >>>>>>>> 2 active+degraded+remapped+backfill_toofull, 2 >>>>>>>> active+recovery_wait+degraded+remapped; 1698 GB data, 5206 GB used, >>>>>>>> 971 GB / 6178 GB avail; 24382B/s rd, 12411KB/s wr, 320op/s; >>>>>>>> 111487/1488290 degraded (7.491%) >>>>>>>> >>>>>>>> >>>>>>>> I have tried to restart all OSD in the cluster, but does not help to >>>>>>>> finish the recovery of the cluster. >>>>>>>> >>>>>>>> Have someone any idea >>>>>>>> >>>>>>>> Kind Regards >>>>>>>> Harald Rößler >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> ceph-users mailing list >>>>>>>> ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com> >>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Wido den Hollander >>>>>>> Ceph consultant and trainer >>>>>>> 42on B.V. >>>>>>> >>>>>>> Phone: +31 (0)20 700 9902<tel:%2B31%20%280%2920%20700%209902> >>>>>>> Skype: contact42on >>>>>>> _______________________________________________ >>>>>>> ceph-users mailing list >>>>>>> ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com> >>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>>>> >>>>> >>>>> >>>>> -- >>>>> Wido den Hollander >>>>> Ceph consultant and trainer >>>>> 42on B.V. >>>>> >>>>> Phone: +31 (0)20 700 9902<tel:%2B31%20%280%2920%20700%209902> >>>>> Skype: contact42on >>>> >>> >>> >>> -- >>> Wido den Hollander >>> Ceph consultant and trainer >>> 42on B.V. >>> >>> Phone: +31 (0)20 700 9902<tel:%2B31%20%280%2920%20700%209902> >>> Skype: contact42on >>> _______________________________________________ >>> ceph-users mailing list >>> ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com> >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >>> >>> 2014-10-20 17:12 GMT+02:00 Wido den Hollander >>> <w...@42on.com<mailto:w...@42on.com>>: >>> On 10/20/2014 05:10 PM, Harald Rößler wrote: >>>> yes, tomorrow I will get the replacement of the failed disk, to get a new >>>> node with many disk will take a few days. >>>> No other idea? >>>> >>> >>> If the disks are all full, then, no. >>> >>> Sorry to say this, but it came down to poor capacity management. Never >>> let any disk in your cluster fill over 80% to prevent these situations. >>> >>> Wido >>> >>>> Harald Rößler >>>> >>>> >>>>> Am 20.10.2014 um 16:45 schrieb Wido den Hollander >>>>> <w...@42on.com<mailto:w...@42on.com>>: >>>>> >>>>> On 10/20/2014 04:43 PM, Harald Rößler wrote: >>>>>> Yes, I had some OSD which was near full, after that I tried to fix the >>>>>> problem with "ceph osd reweight-by-utilization", but this does not help. >>>>>> After that I set the near full ratio to 88% with the idea that the >>>>>> remapping would fix the issue. Also a restart of the OSD doesn’t help. >>>>>> At the same time I had a hardware failure of on disk. :-(. After that >>>>>> failure the recovery process start at "degraded ~ 13%“ and stops at 7%. >>>>>> Honestly I am scared in the moment I am doing the wrong operation. >>>>>> >>>>> >>>>> Any chance of adding a new node with some fresh disks? Seems like you >>>>> are operating on the storage capacity limit of the nodes and that your >>>>> only remedy would be adding more spindles. >>>>> >>>>> Wido >>>>> >>>>>> Regards >>>>>> Harald Rößler >>>>>> >>>>>> >>>>>> >>>>>>> Am 20.10.2014 um 14:51 schrieb Wido den Hollander >>>>>>> <w...@42on.com<mailto:w...@42on.com>>: >>>>>>> >>>>>>> On 10/20/2014 02:45 PM, Harald Rößler wrote: >>>>>>>> Dear All >>>>>>>> >>>>>>>> I have in them moment a issue with my cluster. The recovery process >>>>>>>> stops. >>>>>>>> >>>>>>> >>>>>>> See this: 2 active+degraded+remapped+backfill_toofull >>>>>>> >>>>>>> 156 pgs backfill_toofull >>>>>>> >>>>>>> You have one or more OSDs which are to full and that causes recovery to >>>>>>> stop. >>>>>>> >>>>>>> If you add more capacity to the cluster recovery will continue and >>>>>>> finish. >>>>>>> >>>>>>>> ceph -s >>>>>>>> health HEALTH_WARN 188 pgs backfill; 156 pgs backfill_toofull; 4 pgs >>>>>>>> backfilling; 55 pgs degraded; 49 pgs recovery_wait; 297 pgs stuck >>>>>>>> unclean; recovery 111487/1488290 degraded (7.491%) >>>>>>>> monmap e2: 3 mons at >>>>>>>> {0=10.99.10.10:6789/0,12=10.99.10.22:6789/0,6=10.99.10.16:6789/0<http://10.99.10.10:6789/0,12=10.99.10.22:6789/0,6=10.99.10.16:6789/0>}, >>>>>>>> election epoch 332, quorum 0,1,2 0,12,6 >>>>>>>> osdmap e6748: 24 osds: 23 up, 23 in >>>>>>>> pgmap v43314672: 3328 pgs: 3031 active+clean, 43 >>>>>>>> active+remapped+wait_backfill, 3 active+degraded+wait_backfill, 96 >>>>>>>> active+remapped+wait_backfill+backfill_toofull, 31 >>>>>>>> active+recovery_wait, 19 >>>>>>>> active+degraded+wait_backfill+backfill_toofull, 36 active+remapped, 3 >>>>>>>> active+remapped+backfilling, 18 active+remapped+backfill_toofull, 6 >>>>>>>> active+degraded+remapped+wait_backfill, 15 >>>>>>>> active+recovery_wait+remapped, 21 >>>>>>>> active+degraded+remapped+wait_backfill+backfill_toofull, 1 >>>>>>>> active+recovery_wait+degraded, 1 active+degraded+remapped+backfilling, >>>>>>>> 2 active+degraded+remapped+backfill_toofull, 2 >>>>>>>> active+recovery_wait+degraded+remapped; 1698 GB data, 5206 GB used, >>>>>>>> 971 GB / 6178 GB avail; 24382B/s rd, 12411KB/s wr, 320op/s; >>>>>>>> 111487/1488290 degraded (7.491%) >>>>>>>> >>>>>>>> >>>>>>>> I have tried to restart all OSD in the cluster, but does not help to >>>>>>>> finish the recovery of the cluster. >>>>>>>> >>>>>>>> Have someone any idea >>>>>>>> >>>>>>>> Kind Regards >>>>>>>> Harald Rößler >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> ceph-users mailing list >>>>>>>> ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com> >>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Wido den Hollander >>>>>>> Ceph consultant and trainer >>>>>>> 42on B.V. >>>>>>> >>>>>>> Phone: +31 (0)20 700 9902<tel:%2B31%20%280%2920%20700%209902> >>>>>>> Skype: contact42on >>>>>>> _______________________________________________ >>>>>>> ceph-users mailing list >>>>>>> ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com> >>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>>>> >>>>> >>>>> >>>>> -- >>>>> Wido den Hollander >>>>> Ceph consultant and trainer >>>>> 42on B.V. >>>>> >>>>> Phone: +31 (0)20 700 9902<tel:%2B31%20%280%2920%20700%209902> >>>>> Skype: contact42on >>>> >>> >>> >>> -- >>> Wido den Hollander >>> Ceph consultant and trainer >>> 42on B.V. >>> >>> Phone: +31 (0)20 700 9902<tel:%2B31%20%280%2920%20700%209902> >>> Skype: contact42on >>> _______________________________________________ >>> ceph-users mailing list >>> ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com> >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >>> >>> _______________________________________________ >>> ceph-users mailing list >>> ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com> >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >>> >>> _______________________________________________ >>> ceph-users mailing list >>> ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com> >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >>> >>> >>> >>> >> >> >> -- >> Wido den Hollander >> 42on B.V. >> Ceph trainer and consultant >> >> Phone: +31 (0)20 700 9902 >> Skype: contact42on > _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com