[ceph-users] Re: Pgs troubleshooting

GLE, Vivien Thu, 31 Jul 2025 05:54:29 -0700

I'm sorry for the confusion !

I paste the wrong output.



ceph-objectstore-tool --data-path /var/lib/ceph/Id/osd.1 --op list --pgid 11.4 
--no-mon-config

OSD.1 log

2025-07-31T12:06:56.273+0000 7a9c2bf47680  0 set uid:gid to 167:167 (ceph:ceph)
2025-07-31T12:06:56.273+0000 7a9c2bf47680  0 ceph version 19.2.2 
(0eceb0defba60152a8182f7bd87d164b639885b8) squid (stable), process ceph-osd, 
pid 7
2025-07-31T12:06:56.273+0000 7a9c2bf47680  0 pidfile_write: ignore empty 
--pid-file
2025-07-31T12:06:56.274+0000 7a9c2bf47680  1 bdev(0x57bd64210e00 
/var/lib/ceph/osd/ceph-1/block) open path /var/lib/ceph/osd/ceph-1/block
2025-07-31T12:06:56.274+0000 7a9c2bf47680 -1 bdev(0x57bd64210e00 
/var/lib/ceph/osd/ceph-1/block) open open got: (13) Permission denied
2025-07-31T12:06:56.274+0000 7a9c2bf47680 -1  ** ERROR: unable to open OSD 
superblock on /var/lib/ceph/osd/ceph-1: (2) No such file or directory

----------------------

I retried  on OSD.2 with PG 2.1 to see if I disabled instead of just stopped 
the OSD.2 before objectstore-tool operation will change something but same 
error occurred



________________________________
De : Eugen Block <ebl...@nde.ag>
Envoyé : jeudi 31 juillet 2025 13:27:51
À : GLE, Vivien
Cc : ceph-users@ceph.io
Objet : Re: [ceph-users] Re: Pgs troubleshooting

Why did you look at OSD.2? According to the query output you provided
I would have looked at OSD.1 (acting set). And you pasted the output
of PG 11.4, now you’re trying to list PG 2.1, that is quite confusing.


Zitat von "GLE, Vivien" <vivien....@inist.fr>:

> I dont get why is he searching in this path because there is nothing
> and this is the command I used to check bluestore
>
>
> ceph-objectstore-tool --data-path /var/lib/ceph/"ID"/osd.2 --op list
> --pgid 2.1 --no-mon-config
>
> ________________________________
> De : GLE, Vivien
> Envoyé : jeudi 31 juillet 2025 09:38:25
> À : Eugen Block
> Cc : ceph-users@ceph.io
> Objet : RE: [ceph-users] Re: Pgs troubleshooting
>
>
> Hi,
>
>
>> Or could reducing min_size to 1 help here (Thanks, Anthony)? I’m not
>> entirely sure and am on vacation. 😅 it could be worth a try. But don’t
>> forget to reset min_size back to 2 afterwards.
>
>
> Did it but nothing really changed, how many time should I wait to
> see if it does something ?
>
>
>> No, you use the ceph-objectstore-tool to export the PG from the intact
>> OSD (you need to stop it though, set noout flag), make sure you have
>> enough disk space.
>
>
> I stopped my OSD and noout to check if my PG is stored in bluestore
> (he is not) but when I tried to restart my OSD, OSD superblock was
> gone
>
>
> 2025-07-31T08:33:14.696+0000 7f0c7c889680  1 bdev(0x60945520ae00
> /var/lib/ceph/osd/ceph-2/block) open path
> /var/lib/ceph/osd/ceph-2/block
> 2025-07-31T08:33:14.697+0000 7f0c7c889680 -1 bdev(0x60945520ae00
> /var/lib/ceph/osd/ceph-2/block) open open got: (13) Permission denied
> 2025-07-31T08:33:14.697+0000 7f0c7c889680 -1  ** ERROR: unable to
> open OSD superblock on /var/lib/ceph/osd/ceph-2: (2) No such file or
> directory
>
> Did I miss something?
>
> Thanks
> Vivien
>
>
>
>
> ________________________________
> De : Eugen Block <ebl...@nde.ag>
> Envoyé : mercredi 30 juillet 2025 16:56:50
> À : GLE, Vivien
> Cc : ceph-users@ceph.io
> Objet : [ceph-users] Re: Pgs troubleshooting
>
> Or could reducing min_size to 1 help here (Thanks, Anthony)? I’m not
> entirely sure and am on vacation. 😅 it could be worth a try. But don’t
> forget to reset min_size back to 2 afterwards.
>
> Zitat von "GLE, Vivien" <vivien....@inist.fr>:
>
>> Hi,
>>
>>
>>> did the two replaced OSDs fail at the sime time (before they were
>>> completely drained)? This would most likely mean that both those
>>> failed OSDs contained the other two replicas of this PG
>>
>>
>> Unfortunately yes
>>
>>
>>> This would most likely mean that both those
>>> failed OSDs contained the other two replicas of this PG. A pg query
>>> should show which OSDs are missing.
>>
>>
>> If I understand well I need to move my PG on the OSD 1 ?
>>
>>
>> ceph -w
>>
>>
>>  osd.1 [ERR] 11.4 has 2 objects unfound and apparently lost
>>
>>
>> ceph pg query 11.4
>>
>>
>>
>>      "up": [
>>                     1,
>>                     4,
>>                     5
>>                 ],
>>                 "acting": [
>>                     1,
>>                     4,
>>                     5
>>                 ],
>>                 "avail_no_missing": [],
>>                 "object_location_counts": [
>>                     {
>>                         "shards": "3,4,5",
>>                         "objects": 2
>>                     }
>>                 ],
>>                 "blocked_by": [],
>>                 "up_primary": 1,
>>                 "acting_primary": 1,
>>                 "purged_snaps": []
>>             },
>>
>>
>>
>> Thanks
>>
>>
>> Vivien
>>
>> ________________________________
>> De : Eugen Block <ebl...@nde.ag>
>> Envoyé : mardi 29 juillet 2025 16:48:41
>> À : ceph-users@ceph.io
>> Objet : [ceph-users] Re: Pgs troubleshooting
>>
>> Hi,
>>
>> did the two replaced OSDs fail at the sime time (before they were
>> completely drained)? This would most likely mean that both those
>> failed OSDs contained the other two replicas of this PG. A pg query
>> should show which OSDs are missing.
>> You could try with objectstore-tool to export the PG from the
>> remaining OSD and import it on different OSDs. Or you mark the data as
>> lost if you don't care about the data and want a healthy state quickly.
>>
>> Regards,
>> Eugen
>>
>> Zitat von "GLE, Vivien" <vivien....@inist.fr>:
>>
>>> Thanks for your help ! This is my new pg stat with no more peering
>>> pgs (after rebooting some OSD)
>>>
>>> ceph pg stat ->
>>>
>>> 498 pgs: 1 active+recovery_unfound+degraded, 3
>>> recovery_unfound+undersized+degraded+remapped+peered, 14
>>> active+clean+scrubbing+deep, 480 active+clean;
>>>
>>> 36 GiB data, 169 GiB used, 6.2 TiB / 6.4 TiB avail; 8.8 KiB/s rd, 0
>>> B/s wr, 12 op/s; 715/41838 objects degraded (1.709%); 5/13946
>>> objects unfound (0.036%)
>>>
>>> ceph pg ls recovery_unfound -> shows that PG are replica 3, tried to
>>> repair but nothing happened
>>>
>>>
>>> ceph -w ->
>>>
>>> osd.1 [ERR] 11.4 has 2 objects unfound and apparently lost
>>>
>>>
>>>
>>> ________________________________
>>> De : Frédéric Nass <frederic.n...@clyso.com>
>>> Envoyé : mardi 29 juillet 2025 14:03:37
>>> À : GLE, Vivien
>>> Cc : ceph-users@ceph.io
>>> Objet : Re: [ceph-users] Pgs troubleshooting
>>>
>>> Hi Vivien,
>>>
>>> Unless you ran 'ceph pg stat' command when peering was occuring, the
>>> 37 peering PGs might indicate a temporary peering issue with one or
>>> more OSDs. If that's the case then restarting associated OSDs could
>>> help with the peering or ceph pg. You could list those PGs and
>>> associated OSDs with 'ceph pg ls peering' and trigger peering by
>>> either restarting one common OSD or by using 'ceph pg repeer <pg_id>'.
>>>
>>> Regarding the unfound object and its associated backfill_unfound PG,
>>> you could identify this PG with 'ceph pg ls backfill_unfound' and
>>> investigate this PG with 'ceph pg <pg_id> query'. Depending on the
>>> output, you could try running a 'ceph pg repair <pg_id>'. Could you
>>> confirm that this PG is not part of a size=2 pool?
>>>
>>> Best regards,
>>> Frédéric.
>>>
>>> --
>>> Frédéric Nass
>>> Ceph Ambassador France | Senior Ceph Engineer @ CLYSO
>>> Try our Ceph Analyzer -- https://analyzer.clyso.com/
>>> https://clyso.com | frederic.n...@clyso.com<mailto:frederic.n...@clyso.com>
>>>
>>>
>>> Le mar. 29 juil. 2025 à 14:19, GLE, Vivien
>>> <vivien....@inist.fr<mailto:vivien....@inist.fr>> a écrit :
>>> Hi,
>>>
>>> After replacing 2 OSD (data corruption), this is the stats of my
>>> testing ceph cluster
>>>
>>> ceph pg stat
>>>
>>> 498 pgs: 37 peering, 1 active+remapped+backfilling, 1
>>> active+clean+remapped, 1 active+recovery_wait+undersized+remapped, 1
>>> backfill_unfound+undersized+degraded+remapped+peered, 1
>>> remapped+peering, 12 active+clean+scrubbing+deep, 1
>>> active+undersized, 442 active+clean, 1
>>> active+recovering+undersized+remapped
>>>
>>> 34 GiB data, 175 GiB used, 6.2 TiB / 6.4 TiB avail; 1.7 KiB/s rd, 1
>>> op/s; 31/39768 objects degraded (0.078%); 6/39768 objects misplaced
>>> (0.015%); 1/13256 objects unfound (0.008%)
>>>
>>> ceph osd stat
>>> 7 osds: 7 up (since 20h), 7 in (since 20h); epoch: e427538; 4 remapped pgs
>>>
>>> Anyone had an idea of where to start to get a healthy cluster ?
>>>
>>> Thanks !
>>>
>>> Vivien
>>>
>>>
>>> _______________________________________________
>>> ceph-users mailing list -- ceph-users@ceph.io<mailto:ceph-users@ceph.io>
>>> To unsubscribe send an email to
>>> ceph-users-le...@ceph.io<mailto:ceph-users-le...@ceph.io>
>>> _______________________________________________
>>> ceph-users mailing list -- ceph-users@ceph.io
>>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>>
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io



_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Pgs troubleshooting

Reply via email to