[ceph-users] Re: Pgs troubleshooting

GLE, Vivien Fri, 01 Aug 2025 05:42:54 -0700

I was using ceph-objectstore-tool the wrong way by doing it on host instead of 
inside container via cephadm shell --name osd.x



________________________________
De : GLE, Vivien <vivien....@inist.fr>
Envoyé : vendredi 1 août 2025 09:02:59
À : Eugen Block
Cc : ceph-users@ceph.io
Objet : [ceph-users] Re: Pgs troubleshooting

Hi,


What is the good way of using objectstore tool ?


My OSD are up ! I purged ceph-* on my host following this thread : 
https://www.reddit.com/r/ceph/comments/1me3kvd/containerized_ceph_base_os_experience/


" Make sure that the base OS does not have any ceph packages installed, with 
Ubuntu in the past had issues with ceph-common being installed on the host OS 
and it trying to take ownership of the containerized ceph deployment. If you 
run into any issues check the base OS for ceph-* packages and uninstall. "


I believe the only good way to use ceph commands is in cephadm


Thanks for your help !

________________________________
De : Eugen Block <ebl...@nde.ag>
Envoyé : jeudi 31 juillet 2025 19:42:21
À : GLE, Vivien
Cc : ceph-users@ceph.io
Objet : Re: [ceph-users] Re: Pgs troubleshooting

To use the objectstore tool within the container you don’t have to
specify the cluster’s FSID because it’s mapped into the container. By
using the objectstore tool you might have changed the ownership of the
directory, change it back to the previous state. Other OSDs will show
you which uid/user and/or gid/group that is.

Zitat von "GLE, Vivien" <vivien....@inist.fr>:

> I'm sorry for the confusion !
>
> I paste the wrong output.
>
>
> ceph-objectstore-tool --data-path /var/lib/ceph/Id/osd.1 --op list
> --pgid 11.4 --no-mon-config
>
> OSD.1 log
>
> 2025-07-31T12:06:56.273+0000 7a9c2bf47680  0 set uid:gid to 167:167
> (ceph:ceph)
> 2025-07-31T12:06:56.273+0000 7a9c2bf47680  0 ceph version 19.2.2
> (0eceb0defba60152a8182f7bd87d164b639885b8) squid (stable), process
> ceph-osd, pid 7
> 2025-07-31T12:06:56.273+0000 7a9c2bf47680  0 pidfile_write: ignore
> empty --pid-file
> 2025-07-31T12:06:56.274+0000 7a9c2bf47680  1 bdev(0x57bd64210e00
> /var/lib/ceph/osd/ceph-1/block) open path
> /var/lib/ceph/osd/ceph-1/block
> 2025-07-31T12:06:56.274+0000 7a9c2bf47680 -1 bdev(0x57bd64210e00
> /var/lib/ceph/osd/ceph-1/block) open open got: (13) Permission denied
> 2025-07-31T12:06:56.274+0000 7a9c2bf47680 -1  ** ERROR: unable to
> open OSD superblock on /var/lib/ceph/osd/ceph-1: (2) No such file or
> directory
>
> ----------------------
>
> I retried  on OSD.2 with PG 2.1 to see if I disabled instead of just
> stopped the OSD.2 before objectstore-tool operation will change
> something but same error occurred
>
>
>
> ________________________________
> De : Eugen Block <ebl...@nde.ag>
> Envoyé : jeudi 31 juillet 2025 13:27:51
> À : GLE, Vivien
> Cc : ceph-users@ceph.io
> Objet : Re: [ceph-users] Re: Pgs troubleshooting
>
> Why did you look at OSD.2? According to the query output you provided
> I would have looked at OSD.1 (acting set). And you pasted the output
> of PG 11.4, now you’re trying to list PG 2.1, that is quite confusing.
>
>
> Zitat von "GLE, Vivien" <vivien....@inist.fr>:
>
>> I dont get why is he searching in this path because there is nothing
>> and this is the command I used to check bluestore
>>
>>
>> ceph-objectstore-tool --data-path /var/lib/ceph/"ID"/osd.2 --op list
>> --pgid 2.1 --no-mon-config
>>
>> ________________________________
>> De : GLE, Vivien
>> Envoyé : jeudi 31 juillet 2025 09:38:25
>> À : Eugen Block
>> Cc : ceph-users@ceph.io
>> Objet : RE: [ceph-users] Re: Pgs troubleshooting
>>
>>
>> Hi,
>>
>>
>>> Or could reducing min_size to 1 help here (Thanks, Anthony)? I’m not
>>> entirely sure and am on vacation. 😅 it could be worth a try. But don’t
>>> forget to reset min_size back to 2 afterwards.
>>
>>
>> Did it but nothing really changed, how many time should I wait to
>> see if it does something ?
>>
>>
>>> No, you use the ceph-objectstore-tool to export the PG from the intact
>>> OSD (you need to stop it though, set noout flag), make sure you have
>>> enough disk space.
>>
>>
>> I stopped my OSD and noout to check if my PG is stored in bluestore
>> (he is not) but when I tried to restart my OSD, OSD superblock was
>> gone
>>
>>
>> 2025-07-31T08:33:14.696+0000 7f0c7c889680  1 bdev(0x60945520ae00
>> /var/lib/ceph/osd/ceph-2/block) open path
>> /var/lib/ceph/osd/ceph-2/block
>> 2025-07-31T08:33:14.697+0000 7f0c7c889680 -1 bdev(0x60945520ae00
>> /var/lib/ceph/osd/ceph-2/block) open open got: (13) Permission denied
>> 2025-07-31T08:33:14.697+0000 7f0c7c889680 -1  ** ERROR: unable to
>> open OSD superblock on /var/lib/ceph/osd/ceph-2: (2) No such file or
>> directory
>>
>> Did I miss something?
>>
>> Thanks
>> Vivien
>>
>>
>>
>>
>> ________________________________
>> De : Eugen Block <ebl...@nde.ag>
>> Envoyé : mercredi 30 juillet 2025 16:56:50
>> À : GLE, Vivien
>> Cc : ceph-users@ceph.io
>> Objet : [ceph-users] Re: Pgs troubleshooting
>>
>> Or could reducing min_size to 1 help here (Thanks, Anthony)? I’m not
>> entirely sure and am on vacation. 😅 it could be worth a try. But don’t
>> forget to reset min_size back to 2 afterwards.
>>
>> Zitat von "GLE, Vivien" <vivien....@inist.fr>:
>>
>>> Hi,
>>>
>>>
>>>> did the two replaced OSDs fail at the sime time (before they were
>>>> completely drained)? This would most likely mean that both those
>>>> failed OSDs contained the other two replicas of this PG
>>>
>>>
>>> Unfortunately yes
>>>
>>>
>>>> This would most likely mean that both those
>>>> failed OSDs contained the other two replicas of this PG. A pg query
>>>> should show which OSDs are missing.
>>>
>>>
>>> If I understand well I need to move my PG on the OSD 1 ?
>>>
>>>
>>> ceph -w
>>>
>>>
>>>  osd.1 [ERR] 11.4 has 2 objects unfound and apparently lost
>>>
>>>
>>> ceph pg query 11.4
>>>
>>>
>>>
>>>      "up": [
>>>                     1,
>>>                     4,
>>>                     5
>>>                 ],
>>>                 "acting": [
>>>                     1,
>>>                     4,
>>>                     5
>>>                 ],
>>>                 "avail_no_missing": [],
>>>                 "object_location_counts": [
>>>                     {
>>>                         "shards": "3,4,5",
>>>                         "objects": 2
>>>                     }
>>>                 ],
>>>                 "blocked_by": [],
>>>                 "up_primary": 1,
>>>                 "acting_primary": 1,
>>>                 "purged_snaps": []
>>>             },
>>>
>>>
>>>
>>> Thanks
>>>
>>>
>>> Vivien
>>>
>>> ________________________________
>>> De : Eugen Block <ebl...@nde.ag>
>>> Envoyé : mardi 29 juillet 2025 16:48:41
>>> À : ceph-users@ceph.io
>>> Objet : [ceph-users] Re: Pgs troubleshooting
>>>
>>> Hi,
>>>
>>> did the two replaced OSDs fail at the sime time (before they were
>>> completely drained)? This would most likely mean that both those
>>> failed OSDs contained the other two replicas of this PG. A pg query
>>> should show which OSDs are missing.
>>> You could try with objectstore-tool to export the PG from the
>>> remaining OSD and import it on different OSDs. Or you mark the data as
>>> lost if you don't care about the data and want a healthy state quickly.
>>>
>>> Regards,
>>> Eugen
>>>
>>> Zitat von "GLE, Vivien" <vivien....@inist.fr>:
>>>
>>>> Thanks for your help ! This is my new pg stat with no more peering
>>>> pgs (after rebooting some OSD)
>>>>
>>>> ceph pg stat ->
>>>>
>>>> 498 pgs: 1 active+recovery_unfound+degraded, 3
>>>> recovery_unfound+undersized+degraded+remapped+peered, 14
>>>> active+clean+scrubbing+deep, 480 active+clean;
>>>>
>>>> 36 GiB data, 169 GiB used, 6.2 TiB / 6.4 TiB avail; 8.8 KiB/s rd, 0
>>>> B/s wr, 12 op/s; 715/41838 objects degraded (1.709%); 5/13946
>>>> objects unfound (0.036%)
>>>>
>>>> ceph pg ls recovery_unfound -> shows that PG are replica 3, tried to
>>>> repair but nothing happened
>>>>
>>>>
>>>> ceph -w ->
>>>>
>>>> osd.1 [ERR] 11.4 has 2 objects unfound and apparently lost
>>>>
>>>>
>>>>
>>>> ________________________________
>>>> De : Frédéric Nass <frederic.n...@clyso.com>
>>>> Envoyé : mardi 29 juillet 2025 14:03:37
>>>> À : GLE, Vivien
>>>> Cc : ceph-users@ceph.io
>>>> Objet : Re: [ceph-users] Pgs troubleshooting
>>>>
>>>> Hi Vivien,
>>>>
>>>> Unless you ran 'ceph pg stat' command when peering was occuring, the
>>>> 37 peering PGs might indicate a temporary peering issue with one or
>>>> more OSDs. If that's the case then restarting associated OSDs could
>>>> help with the peering or ceph pg. You could list those PGs and
>>>> associated OSDs with 'ceph pg ls peering' and trigger peering by
>>>> either restarting one common OSD or by using 'ceph pg repeer <pg_id>'.
>>>>
>>>> Regarding the unfound object and its associated backfill_unfound PG,
>>>> you could identify this PG with 'ceph pg ls backfill_unfound' and
>>>> investigate this PG with 'ceph pg <pg_id> query'. Depending on the
>>>> output, you could try running a 'ceph pg repair <pg_id>'. Could you
>>>> confirm that this PG is not part of a size=2 pool?
>>>>
>>>> Best regards,
>>>> Frédéric.
>>>>
>>>> --
>>>> Frédéric Nass
>>>> Ceph Ambassador France | Senior Ceph Engineer @ CLYSO
>>>> Try our Ceph Analyzer -- https://analyzer.clyso.com/
>>>> https://clyso.com |
>>>> frederic.n...@clyso.com<mailto:frederic.n...@clyso.com>
>>>>
>>>>
>>>> Le mar. 29 juil. 2025 à 14:19, GLE, Vivien
>>>> <vivien....@inist.fr<mailto:vivien....@inist.fr>> a écrit :
>>>> Hi,
>>>>
>>>> After replacing 2 OSD (data corruption), this is the stats of my
>>>> testing ceph cluster
>>>>
>>>> ceph pg stat
>>>>
>>>> 498 pgs: 37 peering, 1 active+remapped+backfilling, 1
>>>> active+clean+remapped, 1 active+recovery_wait+undersized+remapped, 1
>>>> backfill_unfound+undersized+degraded+remapped+peered, 1
>>>> remapped+peering, 12 active+clean+scrubbing+deep, 1
>>>> active+undersized, 442 active+clean, 1
>>>> active+recovering+undersized+remapped
>>>>
>>>> 34 GiB data, 175 GiB used, 6.2 TiB / 6.4 TiB avail; 1.7 KiB/s rd, 1
>>>> op/s; 31/39768 objects degraded (0.078%); 6/39768 objects misplaced
>>>> (0.015%); 1/13256 objects unfound (0.008%)
>>>>
>>>> ceph osd stat
>>>> 7 osds: 7 up (since 20h), 7 in (since 20h); epoch: e427538; 4 remapped pgs
>>>>
>>>> Anyone had an idea of where to start to get a healthy cluster ?
>>>>
>>>> Thanks !
>>>>
>>>> Vivien
>>>>
>>>>
>>>> _______________________________________________
>>>> ceph-users mailing list -- ceph-users@ceph.io<mailto:ceph-users@ceph.io>
>>>> To unsubscribe send an email to
>>>> ceph-users-le...@ceph.io<mailto:ceph-users-le...@ceph.io>
>>>> _______________________________________________
>>>> ceph-users mailing list -- ceph-users@ceph.io
>>>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>>
>>>
>>> _______________________________________________
>>> ceph-users mailing list -- ceph-users@ceph.io
>>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>>
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io



_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Pgs troubleshooting

Reply via email to