Hi Vivien, Great to hear that all PGs are now active+clean. Just so you know, the PG export/import procedure Eugen mentioned should have worked to restore them without dropping their data.
Regarding the PGs scrubbing for 5 days, it might be a consequence of many PGs now being processed concurrently when they couldn't while not active+clean. Alternatively, you might be encountering this bug [1] with mClock. Switching osd_op_queue to WPQ or setting osd_scrub_disable_reservation_queuing = true with mClock could help if that's the case. Regards, Frédéric. [1] https://tracker.ceph.com/issues/69078 -- Frédéric Nass Ceph Ambassador France | Senior Ceph Engineer @ CLYSO Try our Ceph Analyzer -- https://analyzer.clyso.com/ https://clyso.com | frederic.n...@clyso.com Le lun. 4 août 2025 à 09:39, GLE, Vivien <vivien....@inist.fr> a écrit : > Hi, > > I got 3 incomplete PG that I put as mark-complete because they were empty > (I think I lost data from them) > > 1 was recovery_unfound, I mark_unfound_lost revert this one > > > but I have beetwen 5-25 deep_scrubbing PGs, I believe this is not normal ? > (it's been since 5 days) > > > Vivien > > ________________________________ > De : Eugen Block <ebl...@nde.ag> > Envoyé : vendredi 1 août 2025 15:58:22 > À : GLE, Vivien > Cc : ceph-users@ceph.io > Objet : Re: [ceph-users] Re: Pgs troubleshooting > > Dont worry, I just wanted to point out that careful reading is crucial. :-) > So you got the OSDs back up, but were you also able to recover the pg? > > Zitat von "GLE, Vivien" <vivien....@inist.fr>: > > > I lost all perspective and didn't read carefully this message.. > > Sorry for that > > > > > > Thanks for your help I'm very grateful > > > > > > Vivien > > > > ________________________________ > > De : Eugen Block <ebl...@nde.ag> > > Envoyé : vendredi 1 août 2025 15:27:56 > > À : GLE, Vivien > > Cc : ceph-users@ceph.io > > Objet : Re: [ceph-users] Re: Pgs troubleshooting > > > > That’s why I mentioned this two days ago: > > > > cephadm shell -- ceph-objectstore-tool --op list … > > > > That’s how you can execute commands directly with cephadm shell, this > > is useful for batch operations like a for loop or similar. Of course, > > first entering the shell and then execute commands works quite as well. > > > > Zitat von "GLE, Vivien" <vivien....@inist.fr>: > > > >> I was using ceph-objectstore-tool the wrong way by doing it on host > >> instead of inside container via cephadm shell --name osd.x > >> > >> > >> ________________________________ > >> De : GLE, Vivien <vivien....@inist.fr> > >> Envoyé : vendredi 1 août 2025 09:02:59 > >> À : Eugen Block > >> Cc : ceph-users@ceph.io > >> Objet : [ceph-users] Re: Pgs troubleshooting > >> > >> Hi, > >> > >> > >> What is the good way of using objectstore tool ? > >> > >> > >> My OSD are up ! I purged ceph-* on my host following this thread : > >> > https://www.reddit.com/r/ceph/comments/1me3kvd/containerized_ceph_base_os_experience/ > >> > >> > >> " Make sure that the base OS does not have any ceph packages > >> installed, with Ubuntu in the past had issues with ceph-common being > >> installed on the host OS and it trying to take ownership of the > >> containerized ceph deployment. If you run into any issues check the > >> base OS for ceph-* packages and uninstall. " > >> > >> > >> I believe the only good way to use ceph commands is in cephadm > >> > >> > >> Thanks for your help ! > >> > >> ________________________________ > >> De : Eugen Block <ebl...@nde.ag> > >> Envoyé : jeudi 31 juillet 2025 19:42:21 > >> À : GLE, Vivien > >> Cc : ceph-users@ceph.io > >> Objet : Re: [ceph-users] Re: Pgs troubleshooting > >> > >> To use the objectstore tool within the container you don’t have to > >> specify the cluster’s FSID because it’s mapped into the container. By > >> using the objectstore tool you might have changed the ownership of the > >> directory, change it back to the previous state. Other OSDs will show > >> you which uid/user and/or gid/group that is. > >> > >> Zitat von "GLE, Vivien" <vivien....@inist.fr>: > >> > >>> I'm sorry for the confusion ! > >>> > >>> I paste the wrong output. > >>> > >>> > >>> ceph-objectstore-tool --data-path /var/lib/ceph/Id/osd.1 --op list > >>> --pgid 11.4 --no-mon-config > >>> > >>> OSD.1 log > >>> > >>> 2025-07-31T12:06:56.273+0000 7a9c2bf47680 0 set uid:gid to 167:167 > >>> (ceph:ceph) > >>> 2025-07-31T12:06:56.273+0000 7a9c2bf47680 0 ceph version 19.2.2 > >>> (0eceb0defba60152a8182f7bd87d164b639885b8) squid (stable), process > >>> ceph-osd, pid 7 > >>> 2025-07-31T12:06:56.273+0000 7a9c2bf47680 0 pidfile_write: ignore > >>> empty --pid-file > >>> 2025-07-31T12:06:56.274+0000 7a9c2bf47680 1 bdev(0x57bd64210e00 > >>> /var/lib/ceph/osd/ceph-1/block) open path > >>> /var/lib/ceph/osd/ceph-1/block > >>> 2025-07-31T12:06:56.274+0000 7a9c2bf47680 -1 bdev(0x57bd64210e00 > >>> /var/lib/ceph/osd/ceph-1/block) open open got: (13) Permission denied > >>> 2025-07-31T12:06:56.274+0000 7a9c2bf47680 -1 ** ERROR: unable to > >>> open OSD superblock on /var/lib/ceph/osd/ceph-1: (2) No such file or > >>> directory > >>> > >>> ---------------------- > >>> > >>> I retried on OSD.2 with PG 2.1 to see if I disabled instead of just > >>> stopped the OSD.2 before objectstore-tool operation will change > >>> something but same error occurred > >>> > >>> > >>> > >>> ________________________________ > >>> De : Eugen Block <ebl...@nde.ag> > >>> Envoyé : jeudi 31 juillet 2025 13:27:51 > >>> À : GLE, Vivien > >>> Cc : ceph-users@ceph.io > >>> Objet : Re: [ceph-users] Re: Pgs troubleshooting > >>> > >>> Why did you look at OSD.2? According to the query output you provided > >>> I would have looked at OSD.1 (acting set). And you pasted the output > >>> of PG 11.4, now you’re trying to list PG 2.1, that is quite confusing. > >>> > >>> > >>> Zitat von "GLE, Vivien" <vivien....@inist.fr>: > >>> > >>>> I dont get why is he searching in this path because there is nothing > >>>> and this is the command I used to check bluestore > >>>> > >>>> > >>>> ceph-objectstore-tool --data-path /var/lib/ceph/"ID"/osd.2 --op list > >>>> --pgid 2.1 --no-mon-config > >>>> > >>>> ________________________________ > >>>> De : GLE, Vivien > >>>> Envoyé : jeudi 31 juillet 2025 09:38:25 > >>>> À : Eugen Block > >>>> Cc : ceph-users@ceph.io > >>>> Objet : RE: [ceph-users] Re: Pgs troubleshooting > >>>> > >>>> > >>>> Hi, > >>>> > >>>> > >>>>> Or could reducing min_size to 1 help here (Thanks, Anthony)? I’m not > >>>>> entirely sure and am on vacation. 😅 it could be worth a try. But > don’t > >>>>> forget to reset min_size back to 2 afterwards. > >>>> > >>>> > >>>> Did it but nothing really changed, how many time should I wait to > >>>> see if it does something ? > >>>> > >>>> > >>>>> No, you use the ceph-objectstore-tool to export the PG from the > intact > >>>>> OSD (you need to stop it though, set noout flag), make sure you have > >>>>> enough disk space. > >>>> > >>>> > >>>> I stopped my OSD and noout to check if my PG is stored in bluestore > >>>> (he is not) but when I tried to restart my OSD, OSD superblock was > >>>> gone > >>>> > >>>> > >>>> 2025-07-31T08:33:14.696+0000 7f0c7c889680 1 bdev(0x60945520ae00 > >>>> /var/lib/ceph/osd/ceph-2/block) open path > >>>> /var/lib/ceph/osd/ceph-2/block > >>>> 2025-07-31T08:33:14.697+0000 7f0c7c889680 -1 bdev(0x60945520ae00 > >>>> /var/lib/ceph/osd/ceph-2/block) open open got: (13) Permission denied > >>>> 2025-07-31T08:33:14.697+0000 7f0c7c889680 -1 ** ERROR: unable to > >>>> open OSD superblock on /var/lib/ceph/osd/ceph-2: (2) No such file or > >>>> directory > >>>> > >>>> Did I miss something? > >>>> > >>>> Thanks > >>>> Vivien > >>>> > >>>> > >>>> > >>>> > >>>> ________________________________ > >>>> De : Eugen Block <ebl...@nde.ag> > >>>> Envoyé : mercredi 30 juillet 2025 16:56:50 > >>>> À : GLE, Vivien > >>>> Cc : ceph-users@ceph.io > >>>> Objet : [ceph-users] Re: Pgs troubleshooting > >>>> > >>>> Or could reducing min_size to 1 help here (Thanks, Anthony)? I’m not > >>>> entirely sure and am on vacation. 😅 it could be worth a try. But > don’t > >>>> forget to reset min_size back to 2 afterwards. > >>>> > >>>> Zitat von "GLE, Vivien" <vivien....@inist.fr>: > >>>> > >>>>> Hi, > >>>>> > >>>>> > >>>>>> did the two replaced OSDs fail at the sime time (before they were > >>>>>> completely drained)? This would most likely mean that both those > >>>>>> failed OSDs contained the other two replicas of this PG > >>>>> > >>>>> > >>>>> Unfortunately yes > >>>>> > >>>>> > >>>>>> This would most likely mean that both those > >>>>>> failed OSDs contained the other two replicas of this PG. A pg query > >>>>>> should show which OSDs are missing. > >>>>> > >>>>> > >>>>> If I understand well I need to move my PG on the OSD 1 ? > >>>>> > >>>>> > >>>>> ceph -w > >>>>> > >>>>> > >>>>> osd.1 [ERR] 11.4 has 2 objects unfound and apparently lost > >>>>> > >>>>> > >>>>> ceph pg query 11.4 > >>>>> > >>>>> > >>>>> > >>>>> "up": [ > >>>>> 1, > >>>>> 4, > >>>>> 5 > >>>>> ], > >>>>> "acting": [ > >>>>> 1, > >>>>> 4, > >>>>> 5 > >>>>> ], > >>>>> "avail_no_missing": [], > >>>>> "object_location_counts": [ > >>>>> { > >>>>> "shards": "3,4,5", > >>>>> "objects": 2 > >>>>> } > >>>>> ], > >>>>> "blocked_by": [], > >>>>> "up_primary": 1, > >>>>> "acting_primary": 1, > >>>>> "purged_snaps": [] > >>>>> }, > >>>>> > >>>>> > >>>>> > >>>>> Thanks > >>>>> > >>>>> > >>>>> Vivien > >>>>> > >>>>> ________________________________ > >>>>> De : Eugen Block <ebl...@nde.ag> > >>>>> Envoyé : mardi 29 juillet 2025 16:48:41 > >>>>> À : ceph-users@ceph.io > >>>>> Objet : [ceph-users] Re: Pgs troubleshooting > >>>>> > >>>>> Hi, > >>>>> > >>>>> did the two replaced OSDs fail at the sime time (before they were > >>>>> completely drained)? This would most likely mean that both those > >>>>> failed OSDs contained the other two replicas of this PG. A pg query > >>>>> should show which OSDs are missing. > >>>>> You could try with objectstore-tool to export the PG from the > >>>>> remaining OSD and import it on different OSDs. Or you mark the data > as > >>>>> lost if you don't care about the data and want a healthy state > quickly. > >>>>> > >>>>> Regards, > >>>>> Eugen > >>>>> > >>>>> Zitat von "GLE, Vivien" <vivien....@inist.fr>: > >>>>> > >>>>>> Thanks for your help ! This is my new pg stat with no more peering > >>>>>> pgs (after rebooting some OSD) > >>>>>> > >>>>>> ceph pg stat -> > >>>>>> > >>>>>> 498 pgs: 1 active+recovery_unfound+degraded, 3 > >>>>>> recovery_unfound+undersized+degraded+remapped+peered, 14 > >>>>>> active+clean+scrubbing+deep, 480 active+clean; > >>>>>> > >>>>>> 36 GiB data, 169 GiB used, 6.2 TiB / 6.4 TiB avail; 8.8 KiB/s rd, 0 > >>>>>> B/s wr, 12 op/s; 715/41838 objects degraded (1.709%); 5/13946 > >>>>>> objects unfound (0.036%) > >>>>>> > >>>>>> ceph pg ls recovery_unfound -> shows that PG are replica 3, tried to > >>>>>> repair but nothing happened > >>>>>> > >>>>>> > >>>>>> ceph -w -> > >>>>>> > >>>>>> osd.1 [ERR] 11.4 has 2 objects unfound and apparently lost > >>>>>> > >>>>>> > >>>>>> > >>>>>> ________________________________ > >>>>>> De : Frédéric Nass <frederic.n...@clyso.com> > >>>>>> Envoyé : mardi 29 juillet 2025 14:03:37 > >>>>>> À : GLE, Vivien > >>>>>> Cc : ceph-users@ceph.io > >>>>>> Objet : Re: [ceph-users] Pgs troubleshooting > >>>>>> > >>>>>> Hi Vivien, > >>>>>> > >>>>>> Unless you ran 'ceph pg stat' command when peering was occuring, the > >>>>>> 37 peering PGs might indicate a temporary peering issue with one or > >>>>>> more OSDs. If that's the case then restarting associated OSDs could > >>>>>> help with the peering or ceph pg. You could list those PGs and > >>>>>> associated OSDs with 'ceph pg ls peering' and trigger peering by > >>>>>> either restarting one common OSD or by using 'ceph pg repeer > <pg_id>'. > >>>>>> > >>>>>> Regarding the unfound object and its associated backfill_unfound PG, > >>>>>> you could identify this PG with 'ceph pg ls backfill_unfound' and > >>>>>> investigate this PG with 'ceph pg <pg_id> query'. Depending on the > >>>>>> output, you could try running a 'ceph pg repair <pg_id>'. Could you > >>>>>> confirm that this PG is not part of a size=2 pool? > >>>>>> > >>>>>> Best regards, > >>>>>> Frédéric. > >>>>>> > >>>>>> -- > >>>>>> Frédéric Nass > >>>>>> Ceph Ambassador France | Senior Ceph Engineer @ CLYSO > >>>>>> Try our Ceph Analyzer -- https://analyzer.clyso.com/ > >>>>>> https://clyso.com | > >>>>>> frederic.n...@clyso.com<mailto:frederic.n...@clyso.com> > >>>>>> > >>>>>> > >>>>>> Le mar. 29 juil. 2025 à 14:19, GLE, Vivien > >>>>>> <vivien....@inist.fr<mailto:vivien....@inist.fr>> a écrit : > >>>>>> Hi, > >>>>>> > >>>>>> After replacing 2 OSD (data corruption), this is the stats of my > >>>>>> testing ceph cluster > >>>>>> > >>>>>> ceph pg stat > >>>>>> > >>>>>> 498 pgs: 37 peering, 1 active+remapped+backfilling, 1 > >>>>>> active+clean+remapped, 1 active+recovery_wait+undersized+remapped, 1 > >>>>>> backfill_unfound+undersized+degraded+remapped+peered, 1 > >>>>>> remapped+peering, 12 active+clean+scrubbing+deep, 1 > >>>>>> active+undersized, 442 active+clean, 1 > >>>>>> active+recovering+undersized+remapped > >>>>>> > >>>>>> 34 GiB data, 175 GiB used, 6.2 TiB / 6.4 TiB avail; 1.7 KiB/s rd, 1 > >>>>>> op/s; 31/39768 objects degraded (0.078%); 6/39768 objects misplaced > >>>>>> (0.015%); 1/13256 objects unfound (0.008%) > >>>>>> > >>>>>> ceph osd stat > >>>>>> 7 osds: 7 up (since 20h), 7 in (since 20h); epoch: e427538; 4 > >>>>>> remapped pgs > >>>>>> > >>>>>> Anyone had an idea of where to start to get a healthy cluster ? > >>>>>> > >>>>>> Thanks ! > >>>>>> > >>>>>> Vivien > >>>>>> > >>>>>> > >>>>>> _______________________________________________ > >>>>>> ceph-users mailing list -- ceph-users@ceph.io<mailto: > ceph-users@ceph.io> > >>>>>> To unsubscribe send an email to > >>>>>> ceph-users-le...@ceph.io<mailto:ceph-users-le...@ceph.io> > >>>>>> _______________________________________________ > >>>>>> ceph-users mailing list -- ceph-users@ceph.io > >>>>>> To unsubscribe send an email to ceph-users-le...@ceph.io > >>>>> > >>>>> > >>>>> _______________________________________________ > >>>>> ceph-users mailing list -- ceph-users@ceph.io > >>>>> To unsubscribe send an email to ceph-users-le...@ceph.io > >>>> > >>>> > >>>> _______________________________________________ > >>>> ceph-users mailing list -- ceph-users@ceph.io > >>>> To unsubscribe send an email to ceph-users-le...@ceph.io > >> > >> > >> > >> _______________________________________________ > >> ceph-users mailing list -- ceph-users@ceph.io > >> To unsubscribe send an email to ceph-users-le...@ceph.io > > > > _______________________________________________ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io > _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io