Hi Igor!

I have plenty of OSDs to loose, as long as the recovery works well afterward, 
so I can go ahead with it :D

What debug flags should I activate? osd=10, bluefs=20, bluestore=20, 
rocksdb=10, ...?

I'm not sure it's really the transaction size, since the broken WriteBatch is 
dumped, and the command index is out of range (that's the WriteBatch tag).
I don't see why the transaction size would result in such a corruption - my 
naive look at the rocksdb sources looks like 14851 repairs shouldn't overflow 
the 32-bit WriteBatch entry counter, but who knows.

Are rocksdb keys like this normal? If yes, what's the construction logic? The 
pool is called 'dumpsite'.

0x80800000000000000a194027'Rdumpsite!rbd_data.6.28423ad8f48ca1.0000000001b366ff!='0xfffffffffffffffeffffffffffffffff'o'
0x80800000000000000a1940f69264756d'psite!rbd_data.6.28423ad8f48ca1.00000000011bdd0c!='0xfffffffffffffffeffffffffffffffff'o'


-- Jonas





On 12/04/2021 16.54, Igor Fedotov wrote:
> Sorry for being too late to the party...
> 
> I think the root cause is related to the high amount of repairs made during 
> the first post-upgrade fsck run.
> 
> The check (and fix) for zombie spanning blobs was been backported to v15.2.9 
> (here is the PR https://github.com/ceph/ceph/pull/39256). And I presumt it's 
> the one which causes BlueFS data corruption due to huge transaction happening 
> during such a repair.
> 
> I haven't seen this exact issue (as having that many zombie blobs is a rarely 
> met bug by itself) but we had to some degree similar issue with upgrading 
> omap names, see: https://github.com/ceph/ceph/pull/39377
> 
> Huge resulting transaction could cause too big write to WAL which in turn 
> caused data corruption (see https://github.com/ceph/ceph/pull/39701)
> 
> Although the fix for the latter has been merged for 15.2.10 some additional 
> issues with huge transactions might still exist...
> 
> 
> If someone can afford another OSD loss it could be interesting to get an OSD 
> log for such a repair with debug-bluefs set to 20...
> 
> I'm planning to make a fix to cap transaction size for repair in the nearest 
> future anyway though..
> 
> 
> Thanks,
> 
> Igor
> 
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to