[ceph-users] Re: Data is still actively corrupted!

Eugen Block via ceph-users Thu, 12 Feb 2026 02:54:55 -0800

Hi,

you seem to hit this issue [0] with the read_from_replica option, see the
announcement in [1]. But I haven't looked in detail, not sure if there's a
way to fix it and if read_from_replica=balance has the same effect.


Regards,
Eugen

[0] https://tracker.ceph.com/issues/73997
[1]
https://lists.ceph.io/hyperkitty/list/[email protected]/thread/JI2ZRF7A3PW55BTH5TFMHNFCZUITYAJJ/

Am Do., 12. Feb. 2026 um 10:46 Uhr schrieb dominik.baack via ceph-users <
[email protected]>:

> Hi,
>
> thanks for your reply.
>
> Mounting was done without the 'root_squash' option, here is the
> corresponding fstab entry:
>
>
> 192.168.251.2,192.168.251.3,192.168.251.4,192.168.251.5,192.168.251.6,192.168.251.7:/
>
> /cephfs ceph
> name=gpu01,secretfile=/etc/ceph/gpu01.key,noatime,_netdev,recover_session=clean,read_from_replica=balance
>
> 0  0
>
> Dominik
>
>
> Am 2026-02-12 09:51, schrieb goetze:
> > Hi !
> >
> > Have you mounted cephfs with the 'root_squash' option set? If so,
> > remove that option. I may be wrong here, but as far as I know, this is
> > still considered unsafe and can lead to data corruption since the
> > necessary code changes have not yet made it into the mainstream linux
> > kernel.
> >
> > Carsten
> >  ------------------------------------------------------------------
> > Carsten Goetze
> > Computer Graphics          tel:   +49 531 391-2109
> > TU Braunschweig            fax:   +49 531 391-2103
> > Muehlenpfordtstr. 23       eMail: [email protected]
> > D-38106 Braunschweig       http://www.cg.cs.tu-bs.de/people/goetze
> >
> >> Am 12.02.2026 um 07:15 schrieb Dominik Baack via ceph-users
> >> <[email protected]>:
> >>
> >> Hi,
> >>
> >> I noticed now that files are still actively corrupted / replaced by
> >> empty files when open saved.
> >>
> >> Access was done via Ceph 19.2.1 from Ubuntu 24 with kernel mount.
> >> Ceph servers are still running 20.2 Tentacle (deployed via cephadm)
> >> with an Ubuntu 24.04 host.
> >>
> >> Options currently set are to noout, norebalance, noscrub and
> >> nodeep-scrub
> >>
> >> I am currently setting up a read only mount to copy all existing
> >> data for a backup.
> >>
> >> I have currently no clue whats going on, I currently was able to
> >> observe this behavior only on the nodes. As those were upgraded as
> >> well (MLNX driver, nvidia-fs,...) can it be network?
> >>
> >> Any Idea how to recover from this?
> >>
> >> Cheers
> >> Dominik
> >>
> >> Am 11.02.2026 um 18:44 schrieb dominik.baack via ceph-users:
> >>
> >>> Hi,
> >>>
> >>> after an controlled shutdown of the whole cluster do to external
> >>> circumstances we decided to update from 19.2 to 20.2 after the
> >>> restart. The system was health before and after the update.
> >>> The nodes mounting the filesystem were not equally lucky and were
> >>> partially shutdown hard. Storage was kept running additional
> >>> ~30min after node shutdown, all inflight operations should have
> >>> finished.
> >>>
> >>> Now we discover the some of the user files seem to be replaced
> >>> with zeros. For example:
> >>>
> >>> stat .gitignore
> >>> File: .gitignore
> >>> Size: 4429            Blocks: 9          IO Block: 4194304
> >>> regular file
> >>> Device: 0,48    Inode: 1100241384598  Links: 1
> >>>
> >>> hexdump -C .gitignore
> >>> 00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
> >>> |................|
> >>> *
> >>> 00001140  00 00 00 00 00 00 00 00  00 00 00 00 00  |.............|
> >>> 0000114d
> >>>
> >>> Scanning for files containing only zeros show several issues of
> >>> files that were likely accessed before or during the shutdown of
> >>> the nodes.
> >>>
> >>> How should I progress from here?
> >> _______________________________________________
> >> ceph-users mailing list -- [email protected]
> >> To unsubscribe send an email to [email protected]
> _______________________________________________
> ceph-users mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
>
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[ceph-users] Re: Data is still actively corrupted!

Reply via email to