On 14 June 2018 at 16:46, Steve Blinkhorn <[email protected]> wrote: > You wrote: >> >> On 7 June 2018 at 14:03, Steve Blinkhorn <[email protected]> wrote: >> > I have a remote server (about to be replaced, but still in service and >> > needs to stay that way until a replacement is fully commissioned) that >> > has just developed a single bad sector. The result has been that >> > automatic backups using rsync have failed, and manual intervention is >> > needed. >> > >> > There are also numerous sleeping processes that refuse to be killed, >> > almost all in the 'tstile' state (this is i386 7.0). >>>snip<< >> > How should I proceed? >> >> First action might be to add a --exclude to the rsync (or move the >> affected file to a different location on the filesystem excluded from >> rsync). >> >> You could work out the affected block and dd zeros to it via the raw >> device, but if the system is going away I'd probably not worry about >> that. >> >> Other questions which might affect approach include: >> - How long before the new system is deployed >> - Do you know if the system would reboot cleanly >> - Is the root filesystem clean >> >> David > > > The root filesystem is clean, but /var is not. I'm arranging a new > colo provider for the replacement servers after shockingly bad service > from Easynet/Interoute (now GTT) - they emailed me today to say they > have no record of our having colo space with them, but that they are > "progressing internally" our request to replace our servers with new > ones, two and a half *months* since we had to remove one after it > failed.
Thats... not terribly fast service :) > I am calculating the risks associated with a reboot, and contemplating > editing /etc/fstab so that /var and /opt (where the bad sector is) > are not fsck'd at reboot. If it drops down to single-user mode I have > no way of recovering the situation (no remote console), so for the > time being I'm nursing the system along - and to be fair to it it is > running normally from a user's point of vie. I remember having a pair of colo x86 servers when serial ports were a thing and having two cables between com0<->com1 for remote console :) I would be tempted to mark opt as noauto as well as fsck pass 0 - if the system reboots you will need to manually login to mount/fsck it, but then you can deal with any fallout. For /var one option (if you have space) would be to copy everything you need from /var to a new /var2 on root, then comment out /var mount in fstab, and 'mv var var-old; mv var2 var' so you could safely reboot, but it may be safer to leave well alone. One useful tool to keep to hand is a USB key with a standard install that runs dhcpcd and sshd (and optionally openvpn back to a known server), so as long as the BIOS is set to boot USB first and you can get someone to plug it in you always have a remote accessible fallback boot option David
