Hi,
I noticed now that files are still actively corrupted / replaced by
empty files when open saved.
Access was done via Ceph 19.2.1 from Ubuntu 24 with kernel mount.
Ceph servers are still running 20.2 Tentacle (deployed via cephadm) with
an Ubuntu 24.04 host.
Options currently set are to noout, norebalance, noscrub and nodeep-scrub
I am currently setting up a read only mount to copy all existing data
for a backup.
I have currently no clue whats going on, I currently was able to observe
this behavior only on the nodes. As those were upgraded as well (MLNX
driver, nvidia-fs,...) can it be network?
Any Idea how to recover from this?
Cheers
Dominik
Am 11.02.2026 um 18:44 schrieb dominik.baack via ceph-users:
Hi,
after an controlled shutdown of the whole cluster do to external
circumstances we decided to update from 19.2 to 20.2 after the
restart. The system was health before and after the update.
The nodes mounting the filesystem were not equally lucky and were
partially shutdown hard. Storage was kept running additional ~30min
after node shutdown, all inflight operations should have finished.
Now we discover the some of the user files seem to be replaced with
zeros. For example:
stat .gitignore
File: .gitignore
Size: 4429 Blocks: 9 IO Block: 4194304 regular file
Device: 0,48 Inode: 1100241384598 Links: 1
hexdump -C .gitignore
00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
|................|
*
00001140 00 00 00 00 00 00 00 00 00 00 00 00 00 |.............|
0000114d
Scanning for files containing only zeros show several issues of files
that were likely accessed before or during the shutdown of the nodes.
How should I progress from here?
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]