Hi,
I got 3 incomplete PG that I put as mark-complete because they were empty
(I think I lost data from them)
1 was recovery_unfound, I mark_unfound_lost revert this one
but I have beetwen 5-25 deep_scrubbing PGs, I believe this is not normal ?
(it's been since 5 days)
Vivien
________________________________
De : Eugen Block <ebl...@nde.ag>
Envoyé : vendredi 1 août 2025 15:58:22
À : GLE, Vivien
Cc : ceph-users@ceph.io
Objet : Re: [ceph-users] Re: Pgs troubleshooting
Dont worry, I just wanted to point out that careful reading is crucial. :-)
So you got the OSDs back up, but were you also able to recover the pg?
Zitat von "GLE, Vivien" <vivien....@inist.fr>:
> I lost all perspective and didn't read carefully this message..
> Sorry for that
>
>
> Thanks for your help I'm very grateful
>
>
> Vivien
>
> ________________________________
> De : Eugen Block <ebl...@nde.ag>
> Envoyé : vendredi 1 août 2025 15:27:56
> À : GLE, Vivien
> Cc : ceph-users@ceph.io
> Objet : Re: [ceph-users] Re: Pgs troubleshooting
>
> That’s why I mentioned this two days ago:
>
> cephadm shell -- ceph-objectstore-tool --op list …
>
> That’s how you can execute commands directly with cephadm shell, this
> is useful for batch operations like a for loop or similar. Of course,
> first entering the shell and then execute commands works quite as well.
>
> Zitat von "GLE, Vivien" <vivien....@inist.fr>:
>
>> I was using ceph-objectstore-tool the wrong way by doing it on host
>> instead of inside container via cephadm shell --name osd.x
>>
>>
>> ________________________________
>> De : GLE, Vivien <vivien....@inist.fr>
>> Envoyé : vendredi 1 août 2025 09:02:59
>> À : Eugen Block
>> Cc : ceph-users@ceph.io
>> Objet : [ceph-users] Re: Pgs troubleshooting
>>
>> Hi,
>>
>>
>> What is the good way of using objectstore tool ?
>>
>>
>> My OSD are up ! I purged ceph-* on my host following this thread :
>>
https://www.reddit.com/r/ceph/comments/1me3kvd/containerized_ceph_base_os_experience/
>>
>>
>> " Make sure that the base OS does not have any ceph packages
>> installed, with Ubuntu in the past had issues with ceph-common being
>> installed on the host OS and it trying to take ownership of the
>> containerized ceph deployment. If you run into any issues check the
>> base OS for ceph-* packages and uninstall. "
>>
>>
>> I believe the only good way to use ceph commands is in cephadm
>>
>>
>> Thanks for your help !
>>
>> ________________________________
>> De : Eugen Block <ebl...@nde.ag>
>> Envoyé : jeudi 31 juillet 2025 19:42:21
>> À : GLE, Vivien
>> Cc : ceph-users@ceph.io
>> Objet : Re: [ceph-users] Re: Pgs troubleshooting
>>
>> To use the objectstore tool within the container you don’t have to
>> specify the cluster’s FSID because it’s mapped into the container. By
>> using the objectstore tool you might have changed the ownership of the
>> directory, change it back to the previous state. Other OSDs will show
>> you which uid/user and/or gid/group that is.
>>
>> Zitat von "GLE, Vivien" <vivien....@inist.fr>:
>>
>>> I'm sorry for the confusion !
>>>
>>> I paste the wrong output.
>>>
>>>
>>> ceph-objectstore-tool --data-path /var/lib/ceph/Id/osd.1 --op list
>>> --pgid 11.4 --no-mon-config
>>>
>>> OSD.1 log
>>>
>>> 2025-07-31T12:06:56.273+0000 7a9c2bf47680 0 set uid:gid to 167:167
>>> (ceph:ceph)
>>> 2025-07-31T12:06:56.273+0000 7a9c2bf47680 0 ceph version 19.2.2
>>> (0eceb0defba60152a8182f7bd87d164b639885b8) squid (stable), process
>>> ceph-osd, pid 7
>>> 2025-07-31T12:06:56.273+0000 7a9c2bf47680 0 pidfile_write: ignore
>>> empty --pid-file
>>> 2025-07-31T12:06:56.274+0000 7a9c2bf47680 1 bdev(0x57bd64210e00
>>> /var/lib/ceph/osd/ceph-1/block) open path
>>> /var/lib/ceph/osd/ceph-1/block
>>> 2025-07-31T12:06:56.274+0000 7a9c2bf47680 -1 bdev(0x57bd64210e00
>>> /var/lib/ceph/osd/ceph-1/block) open open got: (13) Permission denied
>>> 2025-07-31T12:06:56.274+0000 7a9c2bf47680 -1 ** ERROR: unable to
>>> open OSD superblock on /var/lib/ceph/osd/ceph-1: (2) No such file or
>>> directory
>>>
>>> ----------------------
>>>
>>> I retried on OSD.2 with PG 2.1 to see if I disabled instead of just
>>> stopped the OSD.2 before objectstore-tool operation will change
>>> something but same error occurred
>>>
>>>
>>>
>>> ________________________________
>>> De : Eugen Block <ebl...@nde.ag>
>>> Envoyé : jeudi 31 juillet 2025 13:27:51
>>> À : GLE, Vivien
>>> Cc : ceph-users@ceph.io
>>> Objet : Re: [ceph-users] Re: Pgs troubleshooting
>>>
>>> Why did you look at OSD.2? According to the query output you provided
>>> I would have looked at OSD.1 (acting set). And you pasted the output
>>> of PG 11.4, now you’re trying to list PG 2.1, that is quite confusing.
>>>
>>>
>>> Zitat von "GLE, Vivien" <vivien....@inist.fr>:
>>>
>>>> I dont get why is he searching in this path because there is nothing
>>>> and this is the command I used to check bluestore
>>>>
>>>>
>>>> ceph-objectstore-tool --data-path /var/lib/ceph/"ID"/osd.2 --op list
>>>> --pgid 2.1 --no-mon-config
>>>>
>>>> ________________________________
>>>> De : GLE, Vivien
>>>> Envoyé : jeudi 31 juillet 2025 09:38:25
>>>> À : Eugen Block
>>>> Cc : ceph-users@ceph.io
>>>> Objet : RE: [ceph-users] Re: Pgs troubleshooting
>>>>
>>>>
>>>> Hi,
>>>>
>>>>
>>>>> Or could reducing min_size to 1 help here (Thanks, Anthony)? I’m not
>>>>> entirely sure and am on vacation. 😅 it could be worth a try. But
don’t
>>>>> forget to reset min_size back to 2 afterwards.
>>>>
>>>>
>>>> Did it but nothing really changed, how many time should I wait to
>>>> see if it does something ?
>>>>
>>>>
>>>>> No, you use the ceph-objectstore-tool to export the PG from the
intact
>>>>> OSD (you need to stop it though, set noout flag), make sure you have
>>>>> enough disk space.
>>>>
>>>>
>>>> I stopped my OSD and noout to check if my PG is stored in bluestore
>>>> (he is not) but when I tried to restart my OSD, OSD superblock was
>>>> gone
>>>>
>>>>
>>>> 2025-07-31T08:33:14.696+0000 7f0c7c889680 1 bdev(0x60945520ae00
>>>> /var/lib/ceph/osd/ceph-2/block) open path
>>>> /var/lib/ceph/osd/ceph-2/block
>>>> 2025-07-31T08:33:14.697+0000 7f0c7c889680 -1 bdev(0x60945520ae00
>>>> /var/lib/ceph/osd/ceph-2/block) open open got: (13) Permission denied
>>>> 2025-07-31T08:33:14.697+0000 7f0c7c889680 -1 ** ERROR: unable to
>>>> open OSD superblock on /var/lib/ceph/osd/ceph-2: (2) No such file or
>>>> directory
>>>>
>>>> Did I miss something?
>>>>
>>>> Thanks
>>>> Vivien
>>>>
>>>>
>>>>
>>>>
>>>> ________________________________
>>>> De : Eugen Block <ebl...@nde.ag>
>>>> Envoyé : mercredi 30 juillet 2025 16:56:50
>>>> À : GLE, Vivien
>>>> Cc : ceph-users@ceph.io
>>>> Objet : [ceph-users] Re: Pgs troubleshooting
>>>>
>>>> Or could reducing min_size to 1 help here (Thanks, Anthony)? I’m not
>>>> entirely sure and am on vacation. 😅 it could be worth a try. But
don’t
>>>> forget to reset min_size back to 2 afterwards.
>>>>
>>>> Zitat von "GLE, Vivien" <vivien....@inist.fr>:
>>>>
>>>>> Hi,
>>>>>
>>>>>
>>>>>> did the two replaced OSDs fail at the sime time (before they were
>>>>>> completely drained)? This would most likely mean that both those
>>>>>> failed OSDs contained the other two replicas of this PG
>>>>>
>>>>>
>>>>> Unfortunately yes
>>>>>
>>>>>
>>>>>> This would most likely mean that both those
>>>>>> failed OSDs contained the other two replicas of this PG. A pg query
>>>>>> should show which OSDs are missing.
>>>>>
>>>>>
>>>>> If I understand well I need to move my PG on the OSD 1 ?
>>>>>
>>>>>
>>>>> ceph -w
>>>>>
>>>>>
>>>>> osd.1 [ERR] 11.4 has 2 objects unfound and apparently lost
>>>>>
>>>>>
>>>>> ceph pg query 11.4
>>>>>
>>>>>
>>>>>
>>>>> "up": [
>>>>> 1,
>>>>> 4,
>>>>> 5
>>>>> ],
>>>>> "acting": [
>>>>> 1,
>>>>> 4,
>>>>> 5
>>>>> ],
>>>>> "avail_no_missing": [],
>>>>> "object_location_counts": [
>>>>> {
>>>>> "shards": "3,4,5",
>>>>> "objects": 2
>>>>> }
>>>>> ],
>>>>> "blocked_by": [],
>>>>> "up_primary": 1,
>>>>> "acting_primary": 1,
>>>>> "purged_snaps": []
>>>>> },
>>>>>
>>>>>
>>>>>
>>>>> Thanks
>>>>>
>>>>>
>>>>> Vivien
>>>>>
>>>>> ________________________________
>>>>> De : Eugen Block <ebl...@nde.ag>
>>>>> Envoyé : mardi 29 juillet 2025 16:48:41
>>>>> À : ceph-users@ceph.io
>>>>> Objet : [ceph-users] Re: Pgs troubleshooting
>>>>>
>>>>> Hi,
>>>>>
>>>>> did the two replaced OSDs fail at the sime time (before they were
>>>>> completely drained)? This would most likely mean that both those
>>>>> failed OSDs contained the other two replicas of this PG. A pg query
>>>>> should show which OSDs are missing.
>>>>> You could try with objectstore-tool to export the PG from the
>>>>> remaining OSD and import it on different OSDs. Or you mark the data
as
>>>>> lost if you don't care about the data and want a healthy state
quickly.
>>>>>
>>>>> Regards,
>>>>> Eugen
>>>>>
>>>>> Zitat von "GLE, Vivien" <vivien....@inist.fr>:
>>>>>
>>>>>> Thanks for your help ! This is my new pg stat with no more peering
>>>>>> pgs (after rebooting some OSD)
>>>>>>
>>>>>> ceph pg stat ->
>>>>>>
>>>>>> 498 pgs: 1 active+recovery_unfound+degraded, 3
>>>>>> recovery_unfound+undersized+degraded+remapped+peered, 14
>>>>>> active+clean+scrubbing+deep, 480 active+clean;
>>>>>>
>>>>>> 36 GiB data, 169 GiB used, 6.2 TiB / 6.4 TiB avail; 8.8 KiB/s rd, 0
>>>>>> B/s wr, 12 op/s; 715/41838 objects degraded (1.709%); 5/13946
>>>>>> objects unfound (0.036%)
>>>>>>
>>>>>> ceph pg ls recovery_unfound -> shows that PG are replica 3, tried to
>>>>>> repair but nothing happened
>>>>>>
>>>>>>
>>>>>> ceph -w ->
>>>>>>
>>>>>> osd.1 [ERR] 11.4 has 2 objects unfound and apparently lost
>>>>>>
>>>>>>
>>>>>>
>>>>>> ________________________________
>>>>>> De : Frédéric Nass <frederic.n...@clyso.com>
>>>>>> Envoyé : mardi 29 juillet 2025 14:03:37
>>>>>> À : GLE, Vivien
>>>>>> Cc : ceph-users@ceph.io
>>>>>> Objet : Re: [ceph-users] Pgs troubleshooting
>>>>>>
>>>>>> Hi Vivien,
>>>>>>
>>>>>> Unless you ran 'ceph pg stat' command when peering was occuring, the
>>>>>> 37 peering PGs might indicate a temporary peering issue with one or
>>>>>> more OSDs. If that's the case then restarting associated OSDs could
>>>>>> help with the peering or ceph pg. You could list those PGs and
>>>>>> associated OSDs with 'ceph pg ls peering' and trigger peering by
>>>>>> either restarting one common OSD or by using 'ceph pg repeer
<pg_id>'.
>>>>>>
>>>>>> Regarding the unfound object and its associated backfill_unfound PG,
>>>>>> you could identify this PG with 'ceph pg ls backfill_unfound' and
>>>>>> investigate this PG with 'ceph pg <pg_id> query'. Depending on the
>>>>>> output, you could try running a 'ceph pg repair <pg_id>'. Could you
>>>>>> confirm that this PG is not part of a size=2 pool?
>>>>>>
>>>>>> Best regards,
>>>>>> Frédéric.
>>>>>>
>>>>>> --
>>>>>> Frédéric Nass
>>>>>> Ceph Ambassador France | Senior Ceph Engineer @ CLYSO
>>>>>> Try our Ceph Analyzer -- https://analyzer.clyso.com/
>>>>>> https://clyso.com |
>>>>>> frederic.n...@clyso.com<mailto:frederic.n...@clyso.com>
>>>>>>
>>>>>>
>>>>>> Le mar. 29 juil. 2025 à 14:19, GLE, Vivien
>>>>>> <vivien....@inist.fr<mailto:vivien....@inist.fr>> a écrit :
>>>>>> Hi,
>>>>>>
>>>>>> After replacing 2 OSD (data corruption), this is the stats of my
>>>>>> testing ceph cluster
>>>>>>
>>>>>> ceph pg stat
>>>>>>
>>>>>> 498 pgs: 37 peering, 1 active+remapped+backfilling, 1
>>>>>> active+clean+remapped, 1 active+recovery_wait+undersized+remapped, 1
>>>>>> backfill_unfound+undersized+degraded+remapped+peered, 1
>>>>>> remapped+peering, 12 active+clean+scrubbing+deep, 1
>>>>>> active+undersized, 442 active+clean, 1
>>>>>> active+recovering+undersized+remapped
>>>>>>
>>>>>> 34 GiB data, 175 GiB used, 6.2 TiB / 6.4 TiB avail; 1.7 KiB/s rd, 1
>>>>>> op/s; 31/39768 objects degraded (0.078%); 6/39768 objects misplaced
>>>>>> (0.015%); 1/13256 objects unfound (0.008%)
>>>>>>
>>>>>> ceph osd stat
>>>>>> 7 osds: 7 up (since 20h), 7 in (since 20h); epoch: e427538; 4
>>>>>> remapped pgs
>>>>>>
>>>>>> Anyone had an idea of where to start to get a healthy cluster ?
>>>>>>
>>>>>> Thanks !
>>>>>>
>>>>>> Vivien
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> ceph-users mailing list -- ceph-users@ceph.io<mailto:
ceph-users@ceph.io>
>>>>>> To unsubscribe send an email to
>>>>>> ceph-users-le...@ceph.io<mailto:ceph-users-le...@ceph.io>
>>>>>> _______________________________________________
>>>>>> ceph-users mailing list -- ceph-users@ceph.io
>>>>>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> ceph-users mailing list -- ceph-users@ceph.io
>>>>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>>>
>>>>
>>>> _______________________________________________
>>>> ceph-users mailing list -- ceph-users@ceph.io
>>>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>>
>>
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io