Hi,

I have two new-ish 14.2.4 clusters that began life on 14.2.0 , all with HDD OSDs with SSD DB/WALs but neither have experienced obvious problems yet.

What's the impact of this? Does possible data corruption mean possible silent data corruption? Or does the corruption cause the OSD failures mentioned on the tracker and you're basically ok if you either haven't had a failure or if you keep on top of failures the way you would if they were normal disk failures?

Thanks,
Simon

On 14/11/2019 16:10, Sage Weil wrote:
Hi everyone,

We've identified a data corruption bug[1], first introduced[2] (by yours
truly) in 14.2.3 and affecting both 14.2.3 and 14.2.4. The corruption
appears as a rocksdb checksum error or assertion that looks like

os/bluestore/fastbmap_allocator_impl.h: 750: FAILED ceph_assert(available >= 
allocated)

or in some cases a rocksdb checksum error.  It only affects BlueStore OSDs
that have a separate 'db' or 'wal' device.

We have a fix[3] that is working its way through testing, and will
expedite the next Nautilus point release (14.2.5) once it is ready.

If you are running 14.2.2 or 14.2.1 and use BlueStore OSDs with
separate 'db' volumes, you should consider waiting to upgrade
until 14.2.5 is released.

A big thank you to Igor Fedotov and several *extremely* helpful users who
managed to reproduce and track down this problem!

sage


[1] https://tracker.ceph.com/issues/42223
[2] 
https://github.com/ceph/ceph/commit/096033b9d931312c0688c2eea7e14626bfde0ad7#diff-618db1d3389289a9d25840a4500ef0b0
[3] https://github.com/ceph/ceph/pull/31621
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to