Hi,

I noticed now that files are still actively corrupted / replaced by empty files when open saved.

Access was done via Ceph 19.2.1 from Ubuntu 24 with kernel mount.
Ceph servers are still running 20.2 Tentacle (deployed via cephadm) with an Ubuntu 24.04 host.

Options currently set are to noout, norebalance, noscrub and nodeep-scrub

I am currently setting up a read only mount to copy all existing data for a backup.


I have currently no clue whats going on, I currently was able to observe this behavior only on the nodes. As those were upgraded as well (MLNX driver, nvidia-fs,...) can it be network?

Any Idea how to recover from this?

Cheers
Dominik


Am 11.02.2026 um 18:44 schrieb dominik.baack via ceph-users:
Hi,

after an controlled shutdown of the whole cluster do to external circumstances we decided to update from 19.2 to 20.2 after the restart. The system was health before and after the update. The nodes mounting the filesystem were not equally lucky and were partially shutdown hard. Storage was kept running additional ~30min after node shutdown, all inflight operations should have finished.

Now we discover the some of the user files seem to be replaced with zeros. For example:

stat .gitignore
  File: .gitignore
  Size: 4429            Blocks: 9          IO Block: 4194304 regular file
Device: 0,48    Inode: 1100241384598  Links: 1


hexdump -C .gitignore
00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00 |................|
*
00001140  00 00 00 00 00 00 00 00  00 00 00 00 00  |.............|
0000114d


Scanning for files containing only zeros show several issues of files that were likely accessed before or during the shutdown of the nodes.

How should I progress from here?
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to