I lost all perspective and didn't read carefully this message.. Sorry for that
Thanks for your help I'm very grateful Vivien ________________________________ De : Eugen Block <ebl...@nde.ag> Envoyé : vendredi 1 août 2025 15:27:56 À : GLE, Vivien Cc : ceph-users@ceph.io Objet : Re: [ceph-users] Re: Pgs troubleshooting That’s why I mentioned this two days ago: cephadm shell -- ceph-objectstore-tool --op list … That’s how you can execute commands directly with cephadm shell, this is useful for batch operations like a for loop or similar. Of course, first entering the shell and then execute commands works quite as well. Zitat von "GLE, Vivien" <vivien....@inist.fr>: > I was using ceph-objectstore-tool the wrong way by doing it on host > instead of inside container via cephadm shell --name osd.x > > > ________________________________ > De : GLE, Vivien <vivien....@inist.fr> > Envoyé : vendredi 1 août 2025 09:02:59 > À : Eugen Block > Cc : ceph-users@ceph.io > Objet : [ceph-users] Re: Pgs troubleshooting > > Hi, > > > What is the good way of using objectstore tool ? > > > My OSD are up ! I purged ceph-* on my host following this thread : > https://www.reddit.com/r/ceph/comments/1me3kvd/containerized_ceph_base_os_experience/ > > > " Make sure that the base OS does not have any ceph packages > installed, with Ubuntu in the past had issues with ceph-common being > installed on the host OS and it trying to take ownership of the > containerized ceph deployment. If you run into any issues check the > base OS for ceph-* packages and uninstall. " > > > I believe the only good way to use ceph commands is in cephadm > > > Thanks for your help ! > > ________________________________ > De : Eugen Block <ebl...@nde.ag> > Envoyé : jeudi 31 juillet 2025 19:42:21 > À : GLE, Vivien > Cc : ceph-users@ceph.io > Objet : Re: [ceph-users] Re: Pgs troubleshooting > > To use the objectstore tool within the container you don’t have to > specify the cluster’s FSID because it’s mapped into the container. By > using the objectstore tool you might have changed the ownership of the > directory, change it back to the previous state. Other OSDs will show > you which uid/user and/or gid/group that is. > > Zitat von "GLE, Vivien" <vivien....@inist.fr>: > >> I'm sorry for the confusion ! >> >> I paste the wrong output. >> >> >> ceph-objectstore-tool --data-path /var/lib/ceph/Id/osd.1 --op list >> --pgid 11.4 --no-mon-config >> >> OSD.1 log >> >> 2025-07-31T12:06:56.273+0000 7a9c2bf47680 0 set uid:gid to 167:167 >> (ceph:ceph) >> 2025-07-31T12:06:56.273+0000 7a9c2bf47680 0 ceph version 19.2.2 >> (0eceb0defba60152a8182f7bd87d164b639885b8) squid (stable), process >> ceph-osd, pid 7 >> 2025-07-31T12:06:56.273+0000 7a9c2bf47680 0 pidfile_write: ignore >> empty --pid-file >> 2025-07-31T12:06:56.274+0000 7a9c2bf47680 1 bdev(0x57bd64210e00 >> /var/lib/ceph/osd/ceph-1/block) open path >> /var/lib/ceph/osd/ceph-1/block >> 2025-07-31T12:06:56.274+0000 7a9c2bf47680 -1 bdev(0x57bd64210e00 >> /var/lib/ceph/osd/ceph-1/block) open open got: (13) Permission denied >> 2025-07-31T12:06:56.274+0000 7a9c2bf47680 -1 ** ERROR: unable to >> open OSD superblock on /var/lib/ceph/osd/ceph-1: (2) No such file or >> directory >> >> ---------------------- >> >> I retried on OSD.2 with PG 2.1 to see if I disabled instead of just >> stopped the OSD.2 before objectstore-tool operation will change >> something but same error occurred >> >> >> >> ________________________________ >> De : Eugen Block <ebl...@nde.ag> >> Envoyé : jeudi 31 juillet 2025 13:27:51 >> À : GLE, Vivien >> Cc : ceph-users@ceph.io >> Objet : Re: [ceph-users] Re: Pgs troubleshooting >> >> Why did you look at OSD.2? According to the query output you provided >> I would have looked at OSD.1 (acting set). And you pasted the output >> of PG 11.4, now you’re trying to list PG 2.1, that is quite confusing. >> >> >> Zitat von "GLE, Vivien" <vivien....@inist.fr>: >> >>> I dont get why is he searching in this path because there is nothing >>> and this is the command I used to check bluestore >>> >>> >>> ceph-objectstore-tool --data-path /var/lib/ceph/"ID"/osd.2 --op list >>> --pgid 2.1 --no-mon-config >>> >>> ________________________________ >>> De : GLE, Vivien >>> Envoyé : jeudi 31 juillet 2025 09:38:25 >>> À : Eugen Block >>> Cc : ceph-users@ceph.io >>> Objet : RE: [ceph-users] Re: Pgs troubleshooting >>> >>> >>> Hi, >>> >>> >>>> Or could reducing min_size to 1 help here (Thanks, Anthony)? I’m not >>>> entirely sure and am on vacation. 😅 it could be worth a try. But don’t >>>> forget to reset min_size back to 2 afterwards. >>> >>> >>> Did it but nothing really changed, how many time should I wait to >>> see if it does something ? >>> >>> >>>> No, you use the ceph-objectstore-tool to export the PG from the intact >>>> OSD (you need to stop it though, set noout flag), make sure you have >>>> enough disk space. >>> >>> >>> I stopped my OSD and noout to check if my PG is stored in bluestore >>> (he is not) but when I tried to restart my OSD, OSD superblock was >>> gone >>> >>> >>> 2025-07-31T08:33:14.696+0000 7f0c7c889680 1 bdev(0x60945520ae00 >>> /var/lib/ceph/osd/ceph-2/block) open path >>> /var/lib/ceph/osd/ceph-2/block >>> 2025-07-31T08:33:14.697+0000 7f0c7c889680 -1 bdev(0x60945520ae00 >>> /var/lib/ceph/osd/ceph-2/block) open open got: (13) Permission denied >>> 2025-07-31T08:33:14.697+0000 7f0c7c889680 -1 ** ERROR: unable to >>> open OSD superblock on /var/lib/ceph/osd/ceph-2: (2) No such file or >>> directory >>> >>> Did I miss something? >>> >>> Thanks >>> Vivien >>> >>> >>> >>> >>> ________________________________ >>> De : Eugen Block <ebl...@nde.ag> >>> Envoyé : mercredi 30 juillet 2025 16:56:50 >>> À : GLE, Vivien >>> Cc : ceph-users@ceph.io >>> Objet : [ceph-users] Re: Pgs troubleshooting >>> >>> Or could reducing min_size to 1 help here (Thanks, Anthony)? I’m not >>> entirely sure and am on vacation. 😅 it could be worth a try. But don’t >>> forget to reset min_size back to 2 afterwards. >>> >>> Zitat von "GLE, Vivien" <vivien....@inist.fr>: >>> >>>> Hi, >>>> >>>> >>>>> did the two replaced OSDs fail at the sime time (before they were >>>>> completely drained)? This would most likely mean that both those >>>>> failed OSDs contained the other two replicas of this PG >>>> >>>> >>>> Unfortunately yes >>>> >>>> >>>>> This would most likely mean that both those >>>>> failed OSDs contained the other two replicas of this PG. A pg query >>>>> should show which OSDs are missing. >>>> >>>> >>>> If I understand well I need to move my PG on the OSD 1 ? >>>> >>>> >>>> ceph -w >>>> >>>> >>>> osd.1 [ERR] 11.4 has 2 objects unfound and apparently lost >>>> >>>> >>>> ceph pg query 11.4 >>>> >>>> >>>> >>>> "up": [ >>>> 1, >>>> 4, >>>> 5 >>>> ], >>>> "acting": [ >>>> 1, >>>> 4, >>>> 5 >>>> ], >>>> "avail_no_missing": [], >>>> "object_location_counts": [ >>>> { >>>> "shards": "3,4,5", >>>> "objects": 2 >>>> } >>>> ], >>>> "blocked_by": [], >>>> "up_primary": 1, >>>> "acting_primary": 1, >>>> "purged_snaps": [] >>>> }, >>>> >>>> >>>> >>>> Thanks >>>> >>>> >>>> Vivien >>>> >>>> ________________________________ >>>> De : Eugen Block <ebl...@nde.ag> >>>> Envoyé : mardi 29 juillet 2025 16:48:41 >>>> À : ceph-users@ceph.io >>>> Objet : [ceph-users] Re: Pgs troubleshooting >>>> >>>> Hi, >>>> >>>> did the two replaced OSDs fail at the sime time (before they were >>>> completely drained)? This would most likely mean that both those >>>> failed OSDs contained the other two replicas of this PG. A pg query >>>> should show which OSDs are missing. >>>> You could try with objectstore-tool to export the PG from the >>>> remaining OSD and import it on different OSDs. Or you mark the data as >>>> lost if you don't care about the data and want a healthy state quickly. >>>> >>>> Regards, >>>> Eugen >>>> >>>> Zitat von "GLE, Vivien" <vivien....@inist.fr>: >>>> >>>>> Thanks for your help ! This is my new pg stat with no more peering >>>>> pgs (after rebooting some OSD) >>>>> >>>>> ceph pg stat -> >>>>> >>>>> 498 pgs: 1 active+recovery_unfound+degraded, 3 >>>>> recovery_unfound+undersized+degraded+remapped+peered, 14 >>>>> active+clean+scrubbing+deep, 480 active+clean; >>>>> >>>>> 36 GiB data, 169 GiB used, 6.2 TiB / 6.4 TiB avail; 8.8 KiB/s rd, 0 >>>>> B/s wr, 12 op/s; 715/41838 objects degraded (1.709%); 5/13946 >>>>> objects unfound (0.036%) >>>>> >>>>> ceph pg ls recovery_unfound -> shows that PG are replica 3, tried to >>>>> repair but nothing happened >>>>> >>>>> >>>>> ceph -w -> >>>>> >>>>> osd.1 [ERR] 11.4 has 2 objects unfound and apparently lost >>>>> >>>>> >>>>> >>>>> ________________________________ >>>>> De : Frédéric Nass <frederic.n...@clyso.com> >>>>> Envoyé : mardi 29 juillet 2025 14:03:37 >>>>> À : GLE, Vivien >>>>> Cc : ceph-users@ceph.io >>>>> Objet : Re: [ceph-users] Pgs troubleshooting >>>>> >>>>> Hi Vivien, >>>>> >>>>> Unless you ran 'ceph pg stat' command when peering was occuring, the >>>>> 37 peering PGs might indicate a temporary peering issue with one or >>>>> more OSDs. If that's the case then restarting associated OSDs could >>>>> help with the peering or ceph pg. You could list those PGs and >>>>> associated OSDs with 'ceph pg ls peering' and trigger peering by >>>>> either restarting one common OSD or by using 'ceph pg repeer <pg_id>'. >>>>> >>>>> Regarding the unfound object and its associated backfill_unfound PG, >>>>> you could identify this PG with 'ceph pg ls backfill_unfound' and >>>>> investigate this PG with 'ceph pg <pg_id> query'. Depending on the >>>>> output, you could try running a 'ceph pg repair <pg_id>'. Could you >>>>> confirm that this PG is not part of a size=2 pool? >>>>> >>>>> Best regards, >>>>> Frédéric. >>>>> >>>>> -- >>>>> Frédéric Nass >>>>> Ceph Ambassador France | Senior Ceph Engineer @ CLYSO >>>>> Try our Ceph Analyzer -- https://analyzer.clyso.com/ >>>>> https://clyso.com | >>>>> frederic.n...@clyso.com<mailto:frederic.n...@clyso.com> >>>>> >>>>> >>>>> Le mar. 29 juil. 2025 à 14:19, GLE, Vivien >>>>> <vivien....@inist.fr<mailto:vivien....@inist.fr>> a écrit : >>>>> Hi, >>>>> >>>>> After replacing 2 OSD (data corruption), this is the stats of my >>>>> testing ceph cluster >>>>> >>>>> ceph pg stat >>>>> >>>>> 498 pgs: 37 peering, 1 active+remapped+backfilling, 1 >>>>> active+clean+remapped, 1 active+recovery_wait+undersized+remapped, 1 >>>>> backfill_unfound+undersized+degraded+remapped+peered, 1 >>>>> remapped+peering, 12 active+clean+scrubbing+deep, 1 >>>>> active+undersized, 442 active+clean, 1 >>>>> active+recovering+undersized+remapped >>>>> >>>>> 34 GiB data, 175 GiB used, 6.2 TiB / 6.4 TiB avail; 1.7 KiB/s rd, 1 >>>>> op/s; 31/39768 objects degraded (0.078%); 6/39768 objects misplaced >>>>> (0.015%); 1/13256 objects unfound (0.008%) >>>>> >>>>> ceph osd stat >>>>> 7 osds: 7 up (since 20h), 7 in (since 20h); epoch: e427538; 4 >>>>> remapped pgs >>>>> >>>>> Anyone had an idea of where to start to get a healthy cluster ? >>>>> >>>>> Thanks ! >>>>> >>>>> Vivien >>>>> >>>>> >>>>> _______________________________________________ >>>>> ceph-users mailing list -- ceph-users@ceph.io<mailto:ceph-users@ceph.io> >>>>> To unsubscribe send an email to >>>>> ceph-users-le...@ceph.io<mailto:ceph-users-le...@ceph.io> >>>>> _______________________________________________ >>>>> ceph-users mailing list -- ceph-users@ceph.io >>>>> To unsubscribe send an email to ceph-users-le...@ceph.io >>>> >>>> >>>> _______________________________________________ >>>> ceph-users mailing list -- ceph-users@ceph.io >>>> To unsubscribe send an email to ceph-users-le...@ceph.io >>> >>> >>> _______________________________________________ >>> ceph-users mailing list -- ceph-users@ceph.io >>> To unsubscribe send an email to ceph-users-le...@ceph.io > > > > _______________________________________________ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io