[ceph-users] Re: Pgs troubleshooting

GLE, Vivien Fri, 01 Aug 2025 07:56:04 -0700

I lost all perspective and didn't read carefully this message.. Sorry for that



Thanks for your help I'm very grateful


Vivien

________________________________
De : Eugen Block <ebl...@nde.ag>
Envoyé : vendredi 1 août 2025 15:27:56
À : GLE, Vivien
Cc : ceph-users@ceph.io
Objet : Re: [ceph-users] Re: Pgs troubleshooting

That’s why I mentioned this two days ago:

cephadm shell -- ceph-objectstore-tool --op list …

That’s how you can execute commands directly with cephadm shell, this
is useful for batch operations like a for loop or similar. Of course,
first entering the shell and then execute commands works quite as well.

Zitat von "GLE, Vivien" <vivien....@inist.fr>:

> I was using ceph-objectstore-tool the wrong way by doing it on host
> instead of inside container via cephadm shell --name osd.x
>
>
> ________________________________
> De : GLE, Vivien <vivien....@inist.fr>
> Envoyé : vendredi 1 août 2025 09:02:59
> À : Eugen Block
> Cc : ceph-users@ceph.io
> Objet : [ceph-users] Re: Pgs troubleshooting
>
> Hi,
>
>
> What is the good way of using objectstore tool ?
>
>
> My OSD are up ! I purged ceph-* on my host following this thread :
> https://www.reddit.com/r/ceph/comments/1me3kvd/containerized_ceph_base_os_experience/
>
>
> " Make sure that the base OS does not have any ceph packages
> installed, with Ubuntu in the past had issues with ceph-common being
> installed on the host OS and it trying to take ownership of the
> containerized ceph deployment. If you run into any issues check the
> base OS for ceph-* packages and uninstall. "
>
>
> I believe the only good way to use ceph commands is in cephadm
>
>
> Thanks for your help !
>
> ________________________________
> De : Eugen Block <ebl...@nde.ag>
> Envoyé : jeudi 31 juillet 2025 19:42:21
> À : GLE, Vivien
> Cc : ceph-users@ceph.io
> Objet : Re: [ceph-users] Re: Pgs troubleshooting
>
> To use the objectstore tool within the container you don’t have to
> specify the cluster’s FSID because it’s mapped into the container. By
> using the objectstore tool you might have changed the ownership of the
> directory, change it back to the previous state. Other OSDs will show
> you which uid/user and/or gid/group that is.
>
> Zitat von "GLE, Vivien" <vivien....@inist.fr>:
>
>> I'm sorry for the confusion !
>>
>> I paste the wrong output.
>>
>>
>> ceph-objectstore-tool --data-path /var/lib/ceph/Id/osd.1 --op list
>> --pgid 11.4 --no-mon-config
>>
>> OSD.1 log
>>
>> 2025-07-31T12:06:56.273+0000 7a9c2bf47680  0 set uid:gid to 167:167
>> (ceph:ceph)
>> 2025-07-31T12:06:56.273+0000 7a9c2bf47680  0 ceph version 19.2.2
>> (0eceb0defba60152a8182f7bd87d164b639885b8) squid (stable), process
>> ceph-osd, pid 7
>> 2025-07-31T12:06:56.273+0000 7a9c2bf47680  0 pidfile_write: ignore
>> empty --pid-file
>> 2025-07-31T12:06:56.274+0000 7a9c2bf47680  1 bdev(0x57bd64210e00
>> /var/lib/ceph/osd/ceph-1/block) open path
>> /var/lib/ceph/osd/ceph-1/block
>> 2025-07-31T12:06:56.274+0000 7a9c2bf47680 -1 bdev(0x57bd64210e00
>> /var/lib/ceph/osd/ceph-1/block) open open got: (13) Permission denied
>> 2025-07-31T12:06:56.274+0000 7a9c2bf47680 -1  ** ERROR: unable to
>> open OSD superblock on /var/lib/ceph/osd/ceph-1: (2) No such file or
>> directory
>>
>> ----------------------
>>
>> I retried  on OSD.2 with PG 2.1 to see if I disabled instead of just
>> stopped the OSD.2 before objectstore-tool operation will change
>> something but same error occurred
>>
>>
>>
>> ________________________________
>> De : Eugen Block <ebl...@nde.ag>
>> Envoyé : jeudi 31 juillet 2025 13:27:51
>> À : GLE, Vivien
>> Cc : ceph-users@ceph.io
>> Objet : Re: [ceph-users] Re: Pgs troubleshooting
>>
>> Why did you look at OSD.2? According to the query output you provided
>> I would have looked at OSD.1 (acting set). And you pasted the output
>> of PG 11.4, now you’re trying to list PG 2.1, that is quite confusing.
>>
>>
>> Zitat von "GLE, Vivien" <vivien....@inist.fr>:
>>
>>> I dont get why is he searching in this path because there is nothing
>>> and this is the command I used to check bluestore
>>>
>>>
>>> ceph-objectstore-tool --data-path /var/lib/ceph/"ID"/osd.2 --op list
>>> --pgid 2.1 --no-mon-config
>>>
>>> ________________________________
>>> De : GLE, Vivien
>>> Envoyé : jeudi 31 juillet 2025 09:38:25
>>> À : Eugen Block
>>> Cc : ceph-users@ceph.io
>>> Objet : RE: [ceph-users] Re: Pgs troubleshooting
>>>
>>>
>>> Hi,
>>>
>>>
>>>> Or could reducing min_size to 1 help here (Thanks, Anthony)? I’m not
>>>> entirely sure and am on vacation. 😅 it could be worth a try. But don’t
>>>> forget to reset min_size back to 2 afterwards.
>>>
>>>
>>> Did it but nothing really changed, how many time should I wait to
>>> see if it does something ?
>>>
>>>
>>>> No, you use the ceph-objectstore-tool to export the PG from the intact
>>>> OSD (you need to stop it though, set noout flag), make sure you have
>>>> enough disk space.
>>>
>>>
>>> I stopped my OSD and noout to check if my PG is stored in bluestore
>>> (he is not) but when I tried to restart my OSD, OSD superblock was
>>> gone
>>>
>>>
>>> 2025-07-31T08:33:14.696+0000 7f0c7c889680  1 bdev(0x60945520ae00
>>> /var/lib/ceph/osd/ceph-2/block) open path
>>> /var/lib/ceph/osd/ceph-2/block
>>> 2025-07-31T08:33:14.697+0000 7f0c7c889680 -1 bdev(0x60945520ae00
>>> /var/lib/ceph/osd/ceph-2/block) open open got: (13) Permission denied
>>> 2025-07-31T08:33:14.697+0000 7f0c7c889680 -1  ** ERROR: unable to
>>> open OSD superblock on /var/lib/ceph/osd/ceph-2: (2) No such file or
>>> directory
>>>
>>> Did I miss something?
>>>
>>> Thanks
>>> Vivien
>>>
>>>
>>>
>>>
>>> ________________________________
>>> De : Eugen Block <ebl...@nde.ag>
>>> Envoyé : mercredi 30 juillet 2025 16:56:50
>>> À : GLE, Vivien
>>> Cc : ceph-users@ceph.io
>>> Objet : [ceph-users] Re: Pgs troubleshooting
>>>
>>> Or could reducing min_size to 1 help here (Thanks, Anthony)? I’m not
>>> entirely sure and am on vacation. 😅 it could be worth a try. But don’t
>>> forget to reset min_size back to 2 afterwards.
>>>
>>> Zitat von "GLE, Vivien" <vivien....@inist.fr>:
>>>
>>>> Hi,
>>>>
>>>>
>>>>> did the two replaced OSDs fail at the sime time (before they were
>>>>> completely drained)? This would most likely mean that both those
>>>>> failed OSDs contained the other two replicas of this PG
>>>>
>>>>
>>>> Unfortunately yes
>>>>
>>>>
>>>>> This would most likely mean that both those
>>>>> failed OSDs contained the other two replicas of this PG. A pg query
>>>>> should show which OSDs are missing.
>>>>
>>>>
>>>> If I understand well I need to move my PG on the OSD 1 ?
>>>>
>>>>
>>>> ceph -w
>>>>
>>>>
>>>>  osd.1 [ERR] 11.4 has 2 objects unfound and apparently lost
>>>>
>>>>
>>>> ceph pg query 11.4
>>>>
>>>>
>>>>
>>>>      "up": [
>>>>                     1,
>>>>                     4,
>>>>                     5
>>>>                 ],
>>>>                 "acting": [
>>>>                     1,
>>>>                     4,
>>>>                     5
>>>>                 ],
>>>>                 "avail_no_missing": [],
>>>>                 "object_location_counts": [
>>>>                     {
>>>>                         "shards": "3,4,5",
>>>>                         "objects": 2
>>>>                     }
>>>>                 ],
>>>>                 "blocked_by": [],
>>>>                 "up_primary": 1,
>>>>                 "acting_primary": 1,
>>>>                 "purged_snaps": []
>>>>             },
>>>>
>>>>
>>>>
>>>> Thanks
>>>>
>>>>
>>>> Vivien
>>>>
>>>> ________________________________
>>>> De : Eugen Block <ebl...@nde.ag>
>>>> Envoyé : mardi 29 juillet 2025 16:48:41
>>>> À : ceph-users@ceph.io
>>>> Objet : [ceph-users] Re: Pgs troubleshooting
>>>>
>>>> Hi,
>>>>
>>>> did the two replaced OSDs fail at the sime time (before they were
>>>> completely drained)? This would most likely mean that both those
>>>> failed OSDs contained the other two replicas of this PG. A pg query
>>>> should show which OSDs are missing.
>>>> You could try with objectstore-tool to export the PG from the
>>>> remaining OSD and import it on different OSDs. Or you mark the data as
>>>> lost if you don't care about the data and want a healthy state quickly.
>>>>
>>>> Regards,
>>>> Eugen
>>>>
>>>> Zitat von "GLE, Vivien" <vivien....@inist.fr>:
>>>>
>>>>> Thanks for your help ! This is my new pg stat with no more peering
>>>>> pgs (after rebooting some OSD)
>>>>>
>>>>> ceph pg stat ->
>>>>>
>>>>> 498 pgs: 1 active+recovery_unfound+degraded, 3
>>>>> recovery_unfound+undersized+degraded+remapped+peered, 14
>>>>> active+clean+scrubbing+deep, 480 active+clean;
>>>>>
>>>>> 36 GiB data, 169 GiB used, 6.2 TiB / 6.4 TiB avail; 8.8 KiB/s rd, 0
>>>>> B/s wr, 12 op/s; 715/41838 objects degraded (1.709%); 5/13946
>>>>> objects unfound (0.036%)
>>>>>
>>>>> ceph pg ls recovery_unfound -> shows that PG are replica 3, tried to
>>>>> repair but nothing happened
>>>>>
>>>>>
>>>>> ceph -w ->
>>>>>
>>>>> osd.1 [ERR] 11.4 has 2 objects unfound and apparently lost
>>>>>
>>>>>
>>>>>
>>>>> ________________________________
>>>>> De : Frédéric Nass <frederic.n...@clyso.com>
>>>>> Envoyé : mardi 29 juillet 2025 14:03:37
>>>>> À : GLE, Vivien
>>>>> Cc : ceph-users@ceph.io
>>>>> Objet : Re: [ceph-users] Pgs troubleshooting
>>>>>
>>>>> Hi Vivien,
>>>>>
>>>>> Unless you ran 'ceph pg stat' command when peering was occuring, the
>>>>> 37 peering PGs might indicate a temporary peering issue with one or
>>>>> more OSDs. If that's the case then restarting associated OSDs could
>>>>> help with the peering or ceph pg. You could list those PGs and
>>>>> associated OSDs with 'ceph pg ls peering' and trigger peering by
>>>>> either restarting one common OSD or by using 'ceph pg repeer <pg_id>'.
>>>>>
>>>>> Regarding the unfound object and its associated backfill_unfound PG,
>>>>> you could identify this PG with 'ceph pg ls backfill_unfound' and
>>>>> investigate this PG with 'ceph pg <pg_id> query'. Depending on the
>>>>> output, you could try running a 'ceph pg repair <pg_id>'. Could you
>>>>> confirm that this PG is not part of a size=2 pool?
>>>>>
>>>>> Best regards,
>>>>> Frédéric.
>>>>>
>>>>> --
>>>>> Frédéric Nass
>>>>> Ceph Ambassador France | Senior Ceph Engineer @ CLYSO
>>>>> Try our Ceph Analyzer -- https://analyzer.clyso.com/
>>>>> https://clyso.com |
>>>>> frederic.n...@clyso.com<mailto:frederic.n...@clyso.com>
>>>>>
>>>>>
>>>>> Le mar. 29 juil. 2025 à 14:19, GLE, Vivien
>>>>> <vivien....@inist.fr<mailto:vivien....@inist.fr>> a écrit :
>>>>> Hi,
>>>>>
>>>>> After replacing 2 OSD (data corruption), this is the stats of my
>>>>> testing ceph cluster
>>>>>
>>>>> ceph pg stat
>>>>>
>>>>> 498 pgs: 37 peering, 1 active+remapped+backfilling, 1
>>>>> active+clean+remapped, 1 active+recovery_wait+undersized+remapped, 1
>>>>> backfill_unfound+undersized+degraded+remapped+peered, 1
>>>>> remapped+peering, 12 active+clean+scrubbing+deep, 1
>>>>> active+undersized, 442 active+clean, 1
>>>>> active+recovering+undersized+remapped
>>>>>
>>>>> 34 GiB data, 175 GiB used, 6.2 TiB / 6.4 TiB avail; 1.7 KiB/s rd, 1
>>>>> op/s; 31/39768 objects degraded (0.078%); 6/39768 objects misplaced
>>>>> (0.015%); 1/13256 objects unfound (0.008%)
>>>>>
>>>>> ceph osd stat
>>>>> 7 osds: 7 up (since 20h), 7 in (since 20h); epoch: e427538; 4
>>>>> remapped pgs
>>>>>
>>>>> Anyone had an idea of where to start to get a healthy cluster ?
>>>>>
>>>>> Thanks !
>>>>>
>>>>> Vivien
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> ceph-users mailing list -- ceph-users@ceph.io<mailto:ceph-users@ceph.io>
>>>>> To unsubscribe send an email to
>>>>> ceph-users-le...@ceph.io<mailto:ceph-users-le...@ceph.io>
>>>>> _______________________________________________
>>>>> ceph-users mailing list -- ceph-users@ceph.io
>>>>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>>>
>>>>
>>>> _______________________________________________
>>>> ceph-users mailing list -- ceph-users@ceph.io
>>>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>>
>>>
>>> _______________________________________________
>>> ceph-users mailing list -- ceph-users@ceph.io
>>> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io



_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Pgs troubleshooting

Reply via email to