On Wed, 2024-01-31 at 09:27 -0500, Gary Dale wrote: > On 2024-01-30 15:54, hw wrote: > > On Mon, 2024-01-29 at 11:42 -0500, Gary Dale wrote: > > > I'm running Debian/Trixie on an AMD64 workstation. I've lost the ability > > > to see the root directory even when I am logged in as root (su -). > > > > > > This has been happening intermittently for several months. I initially > > > thought it might be related to failing NVME drive that was part of a > > > RAID1 array that is mounted as "/" but I replaced the device and the > > > problem is still happening. > > > [...] > > What happens when you put the device you replaced back? > > > How could putting a known-failing device back in help? The problem > existed before I replaced it and continues to exist after the replacement.
It sounded like you were able to list the root directory (at least sometimes) before you did the replacement. Manually failing the device (perhaps after adding it back first) could make a difference. I've seen such indefinite hangs only when an NFS share has become unreachable after it had been mounted. You could use clonezilla to make a copy and then perhaps convert the file system to btrfs. Do you still have the problem when you remove one of the NVME storage things? Perhaps you have the equivivalent of a bad SATA cable or the mainboard doesn't like it when you access two of those at the same time, or something like that. Even simple network cables can behave very strangely, and NVME may be a bit more complicated than that. Running fsck on every boot to work around an issue like this is certainly a bad idea. Doesn't fsck report anything? If it really makes a difference in itself rather than creating some side effect that leads to the root directory being readable, it should report something. Perhaps you need to increase its verbosity. If there's no report then it would look like a side effect and raise the question what side effect it might be. Does fsck run before the RAID has been brought up or after? Is the RAID up when booting is completed? What does mdadm say about the device(s)? Can you still list the root directory when you manually fail either drive? What exactly are the circumstances under which you can and not list the root directory? You need to do some investigating and ask questions like those ...