Hello Igor,

First of all, sorry about the late reply. It took me a while to export all
shards that weren't available from the osd.2 (1 and 3 were fine, 2 didn't
start but i could use `ceph-objectstore-tool ... --op list-pgs` while osd.0
I couldn't even list the pgs, it threw an error right away - more about it
later in the email)

Two of the unavailable shards, when exporting, ceph-objectstore-tool core
dumped with the same issue in the rocksdb, but I should have enough chunks
to not need them - just mentioning in case is useful:

sh-5.1# ceph-objectstore-tool --data-path /var/lib/ceph/osd --pgid 11.19s2
--op export --file pg.11.19s2.dat
/ceph/rpmbuild/BUILD/ceph-20.2.0/src/kv/RocksDBStore.cc: In function
'virtual int RocksDBStore::get(const std::string&, const std::string&,
ceph::bufferlist*)' thread 7ff3be4ca800 time 2026-02-04T09:42:00.743877+0000
/ceph/rpmbuild/BUILD/ceph-20.2.0/src/kv/RocksDBStore.cc: 1961:
ceph_abort_msg("block checksum mismatch: stored = 246217859, computed =
2155741315, type = 4  in db/170027.sst offset 28264757 size 1417")
 ceph version 20.2.0 (69f84cc2651aa259a15bc192ddaabd3baba07489) tentacle
(stable - RelWithDebInfo)
 1: (ceph::__ceph_abort(char const*, int, char const*,
std::__cxx11::basic_string<char, std::char_traits<char>,
std::allocator<char> > const&)+0xc9) [0x7ff3bf5391fd]
 2: (RocksDBStore::get(std::__cxx11::basic_string<char,
std::char_traits<char>, std::allocator<char> > const&,
std::__cxx11::basic_string<char, std::char_traits<char>,
std::allocator<char> > const&, ceph::buffer::v15_2_0::list*)+0x3bc)
[0x555667b340bc]
 3:
(BlueStore::omap_get_values(boost::intrusive_ptr<ObjectStore::CollectionImpl>&,
ghobject_t const&,
std::set<std::__cxx11::basic_string<char,std::char_traits<char>,
std::allocator<char> >, std::less<std::__cxx11::basic_string<char,
std::char_traits<char>, std::allocator<char> > >,
std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>,
std::allocator<char> > > > const&,
std::map<std::__cxx11::basic_string<char,std::char_traits<char>,
std::allocator<char> >, ceph::buffer::v15_2_0::list,
std::less<std::__cxx11::basic_string<char, std::char_traits<char>,
std::allocator<char> > >,
std::allocator<std::pair<std::__cxx11::basic_string<char,
std::char_traits<char>, std::allocator<char> > const,
ceph::buffer::v15_2_0::list> > >*)+0x401) [0x555667a25fe1]
 4: (PG::peek_map_epoch(ObjectStore*, spg_t, unsigned int*)+0x361)
[0x5556675e0101]
 5: main()
 6: /lib64/libc.so.6(+0x2a610) [0x7ff3be930610]
 7: __libc_start_main()
 8: _start()
*** Caught signal (Aborted) **
 in thread 7ff3be4ca800 thread_name:ceph-objectstor
 ceph version 20.2.0 (69f84cc2651aa259a15bc192ddaabd3baba07489) tentacle
(stable - RelWithDebInfo)
 1: /lib64/libc.so.6(+0x3fc30) [0x7ff3be945c30]
 2: /lib64/libc.so.6(+0x8d03c) [0x7ff3be99303c]
 3: raise()
 4: abort()
 5: (ceph::__ceph_abort(char const*, int, char const*,
std::__cxx11::basic_string<char, std::char_traits<char>,
std::allocator<char> > const&)+0x186) [0x7ff3bf5392ba]
 6: (RocksDBStore::get(std::__cxx11::basic_string<char,
std::char_traits<char>, std::allocator<char> > const&,
std::__cxx11::basic_string<char, std::char_traits<char>,
std::allocator<char> > const&, ceph::buffer::v15_2_0::list*)+0x3bc)
[0x555667b340bc]
 7:
(BlueStore::omap_get_values(boost::intrusive_ptr<ObjectStore::CollectionImpl>&,
ghobject_t const&,
std::set<std::__cxx11::basic_string<char,std::char_traits<char>,
std::allocator<char> >, std::less<std::__cxx11::basic_string<char,
std::char_traits<char>, std::allocator<char> > >,
std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>,
std::allocator<char> > > > const&,
std::map<std::__cxx11::basic_string<char,std::char_traits<char>,
std::allocator<char> >, ceph::buffer::v15_2_0::list,
std::less<std::__cxx11::basic_string<char, std::char_traits<char>,
std::allocator<char> > >,
std::allocator<std::pair<std::__cxx11::basic_string<char,
std::char_traits<char>, std::allocator<char> > const,
ceph::buffer::v15_2_0::list> > >*)+0x401) [0x555667a25fe1]
 8: (PG::peek_map_epoch(ObjectStore*, spg_t, unsigned int*)+0x361)
[0x5556675e0101]
 9: main()
 10: /lib64/libc.so.6(+0x2a610) [0x7ff3be930610]
 11: __libc_start_main()
 12: _start()
Aborted (core dumped)



After importing all shards that I could recover that weren't available, I
don't have any "unknown" pgs anymore. I still have lots of PGs in "down"
state, which I assume I need to flag both "dead" OSDs as lost to unstuck
them. Since it is an operation I cannot go back, I would like to confirm
that is indeed the correct next step to take.

I have a few questions to understand "what happens" in the next step
(marking osd as lost?):

Shall I assume that once I flag an OSD as lost, I won't be able to
"activate" it since I use encryption when initializing the bluestore OSD,
or flagging them as lost won't destroy their unlocking keys? (which means
any hope of further extracting data to be gone, mostly on the osd.0 which I
couldn't use ceph-objectstore-tool at all since the power loss).

I think I should have all the shards from the PGs but just in case, I've
managed to make a clone of the osd.0 on a different physical disk (the
other reason I took long to answer). But still ceph-objectstore-tool
refuses to run:

# ceph-objectstore-tool --data-path /var/lib/ceph/osd --op list-pgs
Mount failed with '(5) Input/output error'

# ls -l /var/lib/ceph/osd
total 28
lrwxrwxrwx 1 ceph ceph  50 Feb  4 08:26 block ->
/dev/mapper/zNPZJR-i0TZ-6NtK-URto-tjfs-iJRb-GCAYEm
-rw------- 1 ceph ceph  37 Feb  4 08:26 ceph_fsid
-rw------- 1 ceph ceph  37 Feb  4 08:26 fsid
-rw------- 1 ceph ceph  55 Feb  4 08:26 keyring
-rw------- 1 ceph ceph 106 Jan 24 00:44 lockbox.keyring
-rw------- 1 ceph ceph   6 Feb  4 08:26 ready
-rw------- 1 ceph ceph  10 Feb  4 08:26 type
-rw------- 1 ceph ceph   2 Feb  4 08:26 whoami

Just as information, all except 2 pools in my cluster are "replicated".
Pools id 11 and 16 are erasure coded (2+1). If I understood correctly, as
long as I have two acting shards (and at most one "NONE"), data should be
available (at least in read-only) once I mark the down OSDs as lost. Is
that understanding correct?

Another information, pools 10 and 15 are the "replicated root pools" before
the erasure coded pools were created.

Ignoring osd.0 for now, here are the current state of my cluster (mds is
intentionally not started while I try to fix the PGs):
### ceph osd lspools
3 .rgw.root
4 default.rgw.log
5 default.rgw.control
6 default.rgw.meta
10 ark.data
11 ark.data_ec
12 ark.metadata
14 .mgr
15 limbo
16 limbo.data_ec
18 default.rgw.buckets.index
19 default.rgw.buckets.data
###

### ceph health
# ceph -s
  cluster:
    id:     021f058f-dbf3-4a23-adb5-21d83f3f1bb6
    health: HEALTH_ERR
            1 filesystem is degraded
            1 filesystem has a failed mds daemon
            1 filesystem is offline
            insufficient standby MDS daemons available
            Reduced data availability: 143 pgs inactive, 143 pgs down
            Degraded data redundancy: 1303896/7149898 objects degraded
(18.237%), 218 pgs degraded, 316 pgs undersized
            144 pgs not deep-scrubbed in time
            459 pgs not scrubbed in time
            256 slow ops, oldest one blocked for 1507794 sec, osd.1 has
slow ops
            too many PGs per OSD (657 > max 500)

  services:
    mon: 2 daemons, quorum ceph-ymir-mon2,ceph-ymir-mon1 (age 2w)
    mgr: ceph-ymir-mgr1(active, since 2w)
    mds: 0/1 daemons up (1 failed)
    osd: 4 osds: 2 up (since 29m), 2 in (since 4w); 24 remapped pgs

  data:
    volumes: 0/1 healthy, 1 failed
    pools:   12 pools, 529 pgs
    objects: 2.46M objects, 7.4 TiB
    usage:   8.3 TiB used, 13 TiB / 22 TiB avail
    pgs:     27.032% pgs not active
             1303896/7149898 objects degraded (18.237%)
             306628/7149898 objects misplaced (4.289%)
             218 active+undersized+degraded
             143 down
             98  active+undersized
             45  active+clean
             19  active+clean+remapped
             4   active+clean+remapped+scrubbing+deep
             1   active+clean+remapped+scrubbing
             1   active+clean+scrubbing+deep
### ceph health

### ceph health detail
# ceph health detail
HEALTH_ERR 1 filesystem is degraded; 1 filesystem has a failed mds daemon;
1 filesystem is offline; insufficient standby
 MDS daemons available; Reduced data availability: 143 pgs inactive, 143
pgs down; Degraded data redundancy: 1303896/714
9898 objects degraded (18.237%), 218 pgs degraded, 316 pgs undersized; 144
pgs not deep-scrubbed in time; 459 pgs not sc
rubbed in time; 256 slow ops, oldest one blocked for 1508207 sec, osd.1 has
slow ops; too many PGs per OSD (657 > max 50
0)
[WRN] FS_DEGRADED: 1 filesystem is degraded
    fs ark is degraded
[WRN] FS_WITH_FAILED_MDS: 1 filesystem has a failed mds daemon
    fs ark has 1 failed mds
[ERR] MDS_ALL_DOWN: 1 filesystem is offline
    fs ark is offline because no MDS is active for it.
[WRN] MDS_INSUFFICIENT_STANDBY: insufficient standby MDS daemons available
    have 0; want 1 more
[WRN] PG_AVAILABILITY: Reduced data availability: 143 pgs inactive, 143 pgs
down
    pg 10.11 is down, acting [1,3]
    pg 10.18 is down, acting [3,1]
    pg 10.1d is down, acting [1,3]
    pg 10.1f is down, acting [1,3]
    pg 11.10 is down, acting [3,1,NONE]
    pg 11.12 is down, acting [1,NONE,3]
    pg 11.18 is stuck inactive for 4w, current state down, last acting
[1,3,NONE]
    pg 11.19 is down, acting [3,1,NONE]
    pg 11.1b is down, acting [1,NONE,3]
    pg 11.62 is down, acting [NONE,3,1]
    pg 11.63 is down, acting [3,NONE,1]
    pg 11.64 is down, acting [NONE,1,3]
    pg 11.66 is down, acting [NONE,3,1]
    pg 11.67 is down, acting [1,NONE,3]
    pg 11.68 is down, acting [3,NONE,1]
    pg 11.69 is down, acting [NONE,1,3]
    pg 11.6a is down, acting [1,NONE,3]
    pg 11.6b is down, acting [NONE,1,3]
    pg 11.6f is down, acting [NONE,3,1]
    pg 11.71 is down, acting [1,3,NONE]
    pg 11.72 is down, acting [1,3,NONE]
    pg 11.74 is down, acting [NONE,3,1]
    pg 11.76 is down, acting [1,NONE,3]
    pg 11.78 is down, acting [3,1,NONE]
    pg 11.7d is down, acting [NONE,3,1]
    pg 11.7e is down, acting [NONE,1,3]
    pg 15.15 is down, acting [1,3]
    pg 15.16 is down, acting [3,1]
    pg 15.17 is down, acting [1,3]
    pg 15.1a is down, acting [3,1]
    pg 16.1 is down, acting [1,3,NONE]
    pg 16.4 is down, acting [1,3,NONE]
    pg 16.b is down, acting [3,NONE,1]
    pg 16.60 is down, acting [3,1,NONE]
    pg 16.61 is down, acting [3,1,NONE]
    pg 16.62 is down, acting [3,NONE,1]
    pg 16.63 is down, acting [3,NONE,1]
    pg 16.65 is down, acting [NONE,3,1]
    pg 16.67 is down, acting [1,NONE,3]
    pg 16.68 is down, acting [1,NONE,3]
    pg 16.69 is down, acting [3,1,NONE]
    pg 16.6a is down, acting [1,3,NONE]
    pg 16.6c is down, acting [1,3,NONE]
    pg 16.70 is down, acting [3,NONE,1]
    pg 16.73 is down, acting [3,NONE,1]
    pg 16.74 is down, acting [1,3,NONE]
    pg 16.75 is down, acting [3,1,NONE]
    pg 16.79 is down, acting [3,NONE,1]
    pg 16.7a is down, acting [1,3,NONE]
    pg 16.7e is down, acting [1,3,NONE]
    pg 16.7f is down, acting [3,NONE,1]
[WRN] PG_DEGRADED: Degraded data redundancy: 1303896/7149898 objects
degraded (18.237%), 218 pgs degraded, 316 pgs under
sized
    pg 3.18 is stuck undersized for 36m, current state active+undersized,
last acting [1,3]
...<snipped for brevity>
###

Once again, I cannot thank you enough for looking into my issue.
I have the impression that being able to recover the data I need is just
around the corner. Although the croit.io blog did mention flagging the osd
as lost, I would like to double check it to avoid losing any possibility to
recover the data.

If there's anything further I could check or if you need full output of the
commands, let me know.

Thanks in advance.

On Tue, 3 Feb 2026 at 10:26, Igor Fedotov <[email protected]> wrote:

> Hi Theo,
>
> you might want to try to use PG export/import using ceph-objectstore-tool.
>
> Please find more details here
> https://www.croit.io/blog/how-to-recover-inactive-pgs-using-ceph-objectstore-tool-on-ceph-clusters
>
>
> Thanks,
>
> Igor
> On 03/02/2026 02:38, Theo Cabrerizo Diem via ceph-users wrote:
>
> :12:18.895+0000 7f0c543eac00 -1 bluestore(/var/lib/ceph/osd)
> fsck error: free extent 0x1714c521000~978b26df000 intersects allocatedblocks
> fsck status: remaining 1 error(s) and warning(s)
>
>
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to