Hi,
Thank you very much.
I will change the mounting options and report back if we still have any
problems afterwards. As of now it seems we were very lucky and most
files are reproducible.
Cheers
Dominik
Am 12.02.2026 um 11:53 schrieb Eugen Block:
Hi,
you seem to hit this issue [0] with the read_from_replica option, see
the announcement in [1]. But I haven't looked in detail, not sure if
there's a way to fix it and if read_from_replica=balance has the same
effect.
Regards,
Eugen
[0] https://tracker.ceph.com/issues/73997
[1]
https://lists.ceph.io/hyperkitty/list/[email protected]/thread/JI2ZRF7A3PW55BTH5TFMHNFCZUITYAJJ/
Am Do., 12. Feb. 2026 um 10:46 Uhr schrieb dominik.baack via
ceph-users <[email protected]>:
Hi,
thanks for your reply.
Mounting was done without the 'root_squash' option, here is the
corresponding fstab entry:
192.168.251.2,192.168.251.3,192.168.251.4,192.168.251.5,192.168.251.6,192.168.251.7:/
/cephfs ceph
name=gpu01,secretfile=/etc/ceph/gpu01.key,noatime,_netdev,recover_session=clean,read_from_replica=balance
0 0
Dominik
Am 2026-02-12 09:51, schrieb goetze:
> Hi !
>
> Have you mounted cephfs with the 'root_squash' option set? If so,
> remove that option. I may be wrong here, but as far as I know,
this is
> still considered unsafe and can lead to data corruption since the
> necessary code changes have not yet made it into the mainstream
linux
> kernel.
>
> Carsten
> ------------------------------------------------------------------
> Carsten Goetze
> Computer Graphics tel: +49 531 391-2109
> TU Braunschweig fax: +49 531 391-2103
> Muehlenpfordtstr. 23 eMail: [email protected]
> D-38106 Braunschweig http://www.cg.cs.tu-bs.de/people/goetze
>
>> Am 12.02.2026 um 07:15 schrieb Dominik Baack via ceph-users
>> <[email protected]>:
>>
>> Hi,
>>
>> I noticed now that files are still actively corrupted / replaced by
>> empty files when open saved.
>>
>> Access was done via Ceph 19.2.1 from Ubuntu 24 with kernel mount.
>> Ceph servers are still running 20.2 Tentacle (deployed via cephadm)
>> with an Ubuntu 24.04 host.
>>
>> Options currently set are to noout, norebalance, noscrub and
>> nodeep-scrub
>>
>> I am currently setting up a read only mount to copy all existing
>> data for a backup.
>>
>> I have currently no clue whats going on, I currently was able to
>> observe this behavior only on the nodes. As those were upgraded as
>> well (MLNX driver, nvidia-fs,...) can it be network?
>>
>> Any Idea how to recover from this?
>>
>> Cheers
>> Dominik
>>
>> Am 11.02.2026 um 18:44 schrieb dominik.baack via ceph-users:
>>
>>> Hi,
>>>
>>> after an controlled shutdown of the whole cluster do to external
>>> circumstances we decided to update from 19.2 to 20.2 after the
>>> restart. The system was health before and after the update.
>>> The nodes mounting the filesystem were not equally lucky and were
>>> partially shutdown hard. Storage was kept running additional
>>> ~30min after node shutdown, all inflight operations should have
>>> finished.
>>>
>>> Now we discover the some of the user files seem to be replaced
>>> with zeros. For example:
>>>
>>> stat .gitignore
>>> File: .gitignore
>>> Size: 4429 Blocks: 9 IO Block: 4194304
>>> regular file
>>> Device: 0,48 Inode: 1100241384598 Links: 1
>>>
>>> hexdump -C .gitignore
>>> 00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>> |................|
>>> *
>>> 00001140 00 00 00 00 00 00 00 00 00 00 00 00 00 |.............|
>>> 0000114d
>>>
>>> Scanning for files containing only zeros show several issues of
>>> files that were likely accessed before or during the shutdown of
>>> the nodes.
>>>
>>> How should I progress from here?
>> _______________________________________________
>> ceph-users mailing list -- [email protected]
>> To unsubscribe send an email to [email protected]
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]