Hello all,
I've been using ceph 19.2.2 on alpine in a small cluster: 3 mons + 4 OSDs
(12Tb each). Due a planning mistake the OSDs have been running on 2 nodes
only, which I was working to get spread to 4 nodes. My luck not being the
best, I had a power loss which made 2 OSDs (one on each node) fail to
start, which of course led to 23% of the PGs not being available.
One of the OSDs (id 0) seems to be gone for good due hardware fault, so I'm
putting efforts to fix the other OSD (id 2).
When I attempt to use 'ceph-bluestore-tool fsck ...' on 19.2.2 it core
dumps almost immediately. So I've tried to use it from the official v20
container.
Running "ceph-bluestore-tool fsck --path /var/lib/ceph/osd" (where activate
is exposing the osd) outputs:
2026-02-02T18:12:18.895+0000 7f0c543eac00 -1 bluestore(/var/lib/ceph/osd)
fsck error: free extent 0x1714c521000~978b26df000 intersects allocatedblocks
fsck status: remaining 1 error(s) and warning(s)
#
And running it with repair instead:
2026-02-02T22:55:53.404+0000 7f8203afdc00 -1 bluestore(/var/lib/ceph/osd)
fsck error: free extent 0x1714c521000~978b26df000 intersects allocatedblocks
2026-02-02T22:55:54.799+0000 7f8203afdc00 -1 rocksdb: submit_common error:
Corruption: block checksum mismatch: stored = 0, computed = 293241624, type
= 4 in db/170024.sst offset 16265037 size 455 code = Rocksdb transaction:
MergeCF( prefix = b key = 0x000001714D100000 value size = 16)
MergeCF( prefix = b key = 0x000001714D100000 value size = 16)
MergeCF( prefix = b key = 0x000001714D100000 value size = 16)
<..... several times the same message...>
MergeCF( prefix = b key = 0x000001714D100000 value size = 16)
MergeCF( prefix = b key = 0x000001714D100000 value size = 16)
/ceph/rpmbuild/BUILD/ceph-20.2.0/src/os/bluestore/BlueStore.cc: In function
'unsigned int BlueStoreRepairer::apply(KeyValueDB*)' thread 7f8203afdc00
time 2026-02-02T22:55:55.024481+0000
/ceph/rpmbuild/BUILD/ceph-20.2.0/src/os/bluestore/BlueStore.cc: 19514:
FAILED ceph_assert(ok)
ceph version 20.2.0 (69f84cc2651aa259a15bc192ddaabd3baba07489) tentacle
(stable - RelWithDebInfo)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x12f) [0x7f8204b6b8f8]
2: ceph-bluestore-tool(+0x19cccd) [0x561c022a7ccd]
3: (BlueStore::_fsck_on_open(BlueStore::FSCKDepth, bool)+0x6277)
[0x561c02416d37]
4: (BlueStore::_fsck(BlueStore::FSCKDepth, bool)+0x1cf) [0x561c024104bf]
5: main()
6: /lib64/libc.so.6(+0x2a610) [0x7f8203f63610]
7: __libc_start_main()
8: _start()
2026-02-02T22:55:55.047+0000 7f8203afdc00 -1
/ceph/rpmbuild/BUILD/ceph-20.2.0/src/os/bluestore/BlueStore.cc: In function
'unsigned int BlueStoreRepairer::apply(KeyValueDB*)' thread 7f8203afdc00
time 2026-02-02T22:55:55.024481+0000
/ceph/rpmbuild/BUILD/ceph-20.2.0/src/os/bluestore/BlueStore.cc: 19514:
FAILED ceph_assert(ok)
ceph version 20.2.0 (69f84cc2651aa259a15bc192ddaabd3baba07489) tentacle
(stable - RelWithDebInfo)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x12f) [0x7f8204b6b8f8]
2: ceph-bluestore-tool(+0x19cccd) [0x561c022a7ccd]
3: (BlueStore::_fsck_on_open(BlueStore::FSCKDepth, bool)+0x6277)
[0x561c02416d37]
4: (BlueStore::_fsck(BlueStore::FSCKDepth, bool)+0x1cf) [0x561c024104bf]
5: main()
6: /lib64/libc.so.6(+0x2a610) [0x7f8203f63610]
7: __libc_start_main()
8: _start()
*** Caught signal (Aborted) **
in thread 7f8203afdc00 thread_name:ceph-bluestore-
ceph version 20.2.0 (69f84cc2651aa259a15bc192ddaabd3baba07489) tentacle
(stable - RelWithDebInfo)
1: /lib64/libc.so.6(+0x3fc30) [0x7f8203f78c30]
2: /lib64/libc.so.6(+0x8d03c) [0x7f8203fc603c]
3: raise()
4: abort()
5: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x18e) [0x7f8204b6b957]
6: ceph-bluestore-tool(+0x19cccd) [0x561c022a7ccd]
7: (BlueStore::_fsck_on_open(BlueStore::FSCKDepth, bool)+0x6277)
[0x561c02416d37]
8: (BlueStore::_fsck(BlueStore::FSCKDepth, bool)+0x1cf) [0x561c024104bf]
9: main()
10: /lib64/libc.so.6(+0x2a610) [0x7f8203f63610]
11: __libc_start_main()
12: _start()
2026-02-02T22:55:55.082+0000 7f8203afdc00 -1 *** Caught signal (Aborted) **
in thread 7f8203afdc00 thread_name:ceph-bluestore-
ceph version 20.2.0 (69f84cc2651aa259a15bc192ddaabd3baba07489) tentacle
(stable - RelWithDebInfo)
1: /lib64/libc.so.6(+0x3fc30) [0x7f8203f78c30]
2: /lib64/libc.so.6(+0x8d03c) [0x7f8203fc603c]
3: raise()
4: abort()
5: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x18e) [0x7f8204b6b957]
6: ceph-bluestore-tool(+0x19cccd) [0x561c022a7ccd]
7: (BlueStore::_fsck_on_open(BlueStore::FSCKDepth, bool)+0x6277)
[0x561c02416d37]
8: (BlueStore::_fsck(BlueStore::FSCKDepth, bool)+0x1cf) [0x561c024104bf]
9: main()
10: /lib64/libc.so.6(+0x2a610) [0x7f8203f63610]
11: __libc_start_main()
12: _start()
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
to interpret this.
-499> 2026-02-02T22:55:53.404+0000 7f8203afdc00 -1
bluestore(/var/lib/ceph/osd) fsck error: free extent
0x1714c521000~978b26df000 intersects allocated blocks
-498> 2026-02-02T22:55:54.799+0000 7f8203afdc00 -1 rocksdb: submit_common
error: Corruption: block checksum mismatch: stored = 0, computed =
293241624, type = 4 in db/170024.sst offset 16265037 size 455 code = ☻
Rocksdb transaction:
Trying to spin up OSD throws this output:
<... snipped for brevity ..>
-27> 2026-01-24T01:50:57.764+0000 7eff73af5ed0 0 osd.2 2691 crush map
has features 432629308056666112 was 8705, adjusting msgr requires for mons
-26> 2026-01-24T01:50:57.764+0000 7eff73af5ed0 0 osd.2 2691 crush map
has features 3314933069573799936, adjusting msgr requires for osds
-25> 2026-01-24T01:50:57.764+0000 7eff73af5ed0 1 osd.2 2691
check_osdmap_features require_osd_release unknown -> squid
-24> 2026-01-24T01:51:02.717+0000 7eff72019b30 5 rocksdb:
commit_cache_size High Pri Pool Ratio set to 0.133333
-23> 2026-01-24T01:51:02.717+0000 7eff72019b30 5 rocksdb:
commit_cache_size High Pri Pool Ratio set to 0.0240964
-22> 2026-01-24T01:51:02.717+0000 7eff72019b30 5
bluestore.MempoolThread(0x7eff7397ebe0) _resize_shards cache_size:
6496138035 kv_alloc: 2785017856 kv_used: 6271162 kv_onode_alloc: 503316480
kv_onode_used: 115387532 meta_alloc: 2717908992 meta_used: 57075
data_alloc: 436207616 data_used: 20480
-21> 2026-01-24T01:51:05.010+0000 7eff725b7b30 3 rocksdb:
[db/db_impl/db_impl_compaction_flush.cc:3496] Compaction error: Corruption:
block checksum mismatch: stored = 0, computed = 293241624, type = 4 in
db/170024.sst offset 16265037 size 455
-20> 2026-01-24T01:51:05.010+0000 7eff725b7b30 3 rocksdb:
[db/error_handler.cc:397] Background IO error Corruption: block checksum
mismatch:stored = 0, computed = 293241624, type = 4 in db/170024.sst
offset 16265037 size 455
-19> 2026-01-24T01:51:05.010+0000 7eff725b7b30 4 rocksdb:
[db/error_handler.cc:285] ErrorHandler: Set regular background error
-18> 2026-01-24T01:51:05.010+0000 7eff725b7b30 4 rocksdb: (Original Log
Time 2026/01/24-01:51:05.011167) [db/compaction/compaction_job.cc:865]
[default] compacted to: files[8 3 0 0 0 0 0] max score 0.01, MB/sec: 16.4
rd, 2.2 wr, level 1, files in(8, 1) out(1 +0 blob) MB in(0.0, 114.7 +0.0
blob) out(15.5 +0.0 blob), read-write-amplify(13694.2)
write-amplify(1631.5) Corruption: block checksum mismatch: stored = 0,
computed = 293241624, type = 4 in db/170024.sst offset 16265037 size 455,
records in: 20619364, records dropped: 17423737 output_compression: LZ4
-17> 2026-01-24T01:51:05.010+0000 7eff725b7b30 4 rocksdb: (Original Log
Time 2026/01/24-01:51:05.011226) EVENT_LOG_v1 {"time_micros":
1769219465011193, "job": 3, "event": "compaction_finished",
"compaction_time_micros": 7334898, "compaction_time_cpu_micros": 7009405,
"output_level": 1, "num_output_files": 1, "total_output_size": 16264458,
"num_input_records": 20619364, "num_output_records": 3195627,
"num_subcompactions": 1, "output_compression": "LZ4",
"num_single_delete_mismatches": 0, "num_single_delete_fallthrough": 0,
"lsm_state": [8, 3, 0, 0, 0, 0, 0]}
-16> 2026-01-24T01:51:05.010+0000 7eff725b7b30 2 rocksdb:
[db/db_impl/db_impl_compaction_flush.cc:2986] Waiting after background
compaction error: Corruption: block checksum mismatch: stored = 0, computed
= 293241624, type = 4 in db/170024.sst offset 16265037 size
455,Accumulated background error counts: 1
-15> 2026-01-24T01:51:06.032+0000 7eff725b7b30 4 rocksdb:
[file/delete_scheduler.cc:74] Deleted file db/170263.sst immediately,
rate_bytes_per_sec 0, total_trash_size 0 max_trash_db_ratio 0.250000
-14> 2026-01-24T01:51:06.032+0000 7eff725b7b30 4 rocksdb: EVENT_LOG_v1
{"time_micros": 1769219466032857, "job": 3, "event": "table_file_deletion",
"file_number": 170263}
-13> 2026-01-24T01:51:07.746+0000 7eff72019b30 5 rocksdb:
commit_cache_size High Pri Pool Ratio set to 0.117647
-12> 2026-01-24T01:51:07.746+0000 7eff72019b30 5 rocksdb:
commit_cache_size High Pri Pool Ratio set to 0.0243902
-11> 2026-01-24T01:51:07.746+0000 7eff72019b30 5
bluestore.MempoolThread(0x7eff7397ebe0) _resize_shards cache_size:
6496138035 kv_alloc: 2751463424 kv_used: 3155996 kv_onode_alloc: 570425344
kv_onode_used: 183341998 meta_alloc: 2684354560 meta_used: 57075
data_alloc: 436207616 data_used: 20480
-10> 2026-01-24T01:51:12.768+0000 7eff72019b30 5 rocksdb:
commit_cache_size High Pri Pool Ratio set to 0.111111
-9> 2026-01-24T01:51:12.768+0000 7eff72019b30 5 rocksdb:
commit_cache_size High Pri Pool Ratio set to 0.0246914
-8> 2026-01-24T01:51:12.768+0000 7eff72019b30 5
bluestore.MempoolThread(0x7eff7397ebe0) _resize_shards cache_size:
6496138035 kv_alloc: 2717908992 kv_used: 3155996 kv_onode_alloc: 603979776
kv_onode_used: 226502426 meta_alloc: 2650800128 meta_used: 57075
data_alloc: 436207616 data_used: 20480
-7> 2026-01-24T01:51:15.044+0000 7eff73af5ed0 0 osd.2 2691 load_pgs
-6> 2026-01-24T01:51:15.108+0000 7eff73af5ed0 5 osd.2 pg_epoch: 2691
pg[6.c(unlocked)] enter Initial
-5> 2026-01-24T01:51:15.108+0000 7eff73af5ed0 5 osd.2 pg_epoch: 2691
pg[6.c( empty local-lis/les=2690/2691 n=0 ec=196/142 lis/c=2690/2690
les/c/f=2691/2691/0 sis=2690) [1,3]/[1,3,2] r=2 lpr=0 crt=0'0 mlcod 0'0
unknown mbc={}] exit Initial 0.000424 0 0.000000
-4> 2026-01-24T01:51:15.108+0000 7eff73af5ed0 5 osd.2 pg_epoch: 2691
pg[6.c( empty local-lis/les=2690/2691 n=0 ec=196/142 lis/c=2690/2690
les/c/f=2691/2691/0 sis=2690) [1,3]/[1,3,2] r=2 lpr=0 crt=0'0 mlcod 0'0
unknown mbc={}] enter Reset
-3> 2026-01-24T01:51:15.108+0000 7eff72065b30 5
bluestore(/var/lib/ceph/osd) _kv_sync_thread utilization: idle
17.420808792s of 17.420808792s, submitted: 0
-2> 2026-01-24T01:51:15.108+0000 7eff72065b30 -1 rocksdb: submit_common
error: Corruption: block checksum mismatch: stored = 0, computed =
293241624, type = 4 in db/170024.sst offset 16265037 size 455 code =
Rocksdb transaction:
PutCF( prefix = O key =
0x7F800000000000000630000000'!!='0xFFFFFFFFFFFFFFFEFFFFFFFFFFFFFFFF6F value
size = 34)
PutCF( prefix = S key = 'nid_max' value size = 8)
PutCF( prefix = S key = 'blobid_max' value size = 8)
-1> 2026-01-24T01:51:15.108+0000 7eff72065b30 -1
/home/buildozer/aports/community/ceph19/src/ceph-19.2.2/src/os/bluestore/BlueStore.cc:
In function 'void BlueStore::_txc_apply_kv(TransContext*, bool)' thread
7eff72065b30 time 2026-01-24T01:51:15.109433+0000
/home/buildozer/aports/community/ceph19/src/ceph-19.2.2/src/os/bluestore/BlueStore.cc:
14045: FAILED ceph_assert(r == 0)
ceph version Development (no_version) squid (stable)
0> 2026-01-24T01:51:15.109+0000 7eff72065b30 -1 *** Caught signal
(Aborted) **
in thread 7eff72065b30 thread_name:
Any help getting the OSD back running would be greatly appreciated. There's
only a handful of folders from cephfs I would like to recover from the
system which weren't backed up. The system was far from full (OSDs were 30%
full).
Of course I can post full logs if needed, just let me know if I should
attach them in an email in the mailing list or if preferable to use
something like pastebin or another way.
Many thanks,
Theo
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]