Daiajo Tibdixious <[email protected]> posted [email protected], excerpted below, on Mon, 16 Feb 2009 13:05:14 +1100:
> During the freeze I can still move the cursor around but not select > anything & the keyboard does nothing. Here's something that should at least help keep your filesystems in a semi-usable state, and it /may/ help you continue without a reboot, depending on what is wrong. First, make sure you have the Magic SysRq option turned on in the kernel. In menuconfig, that's Kernel hacking > Magic SysRq key, MAGIC_SYSRQ is the config option key. If you check the KConfig help text for it it lists a document (Documentation/sysrq.txt) you can check for more info but with that on in your kernel here's the basics for emergency recovery. Memorize the following sequence, the last three being most critical: REISUB Then, when you have a lockup, hit Alt-SysRq-<letters>. Unless the kernel is so screwed up it's afraid to write anything to disk for fear it'll write where it shouldn't or isn't responding at all, this will at least keep your filesystems at least somewhat in order. Here's what each letter does: R Raw: Turn off keyboard RAW mode. If you are working at the text console this one likely isn't necessary, but from X, it will be since otherwise it may not see other keys properly. E tErm: Send a SIGTERM (15) to all apps (except init). This tells them to save their files and shut down. This is a strong hint that the app should terminate but it can ignore it if it wants (as an app with an unsaved document might, tho it would popup a save dialog, but of course you likely won't be able to see it). If there's disk activity, you may at this point wish to wait until it stops. I kIll: Send a SIGKILL (9) to all apps (except init). This is the unblockable kill signal. It should kill all normal apps, unless they are stuck in uninterruptable sleep (as can be the case if they are waiting for disk i/o, but aren't getting it for some reason). S Sync: This causes the kernel to attempt to sync all mounted filesystems, flushing all buffers, etc. Alt-SysRq-S can be used at other times you want to make sure everything's synced, as well. If there's an operation that locked the computer before, this is a good sequence to use before you try it again. At least the buffers should be synced if it locks up again. Again, if possible, wait until any disk activity stops. Up to this point, the system is still generally operational. The EI operations should have killed all userspace apps but init, and if you are lucky and the system wasn't too hosed, init may well have popped a login back up. If so, you can login and continue if desired, but note that system services will have been killed as well, so you'd need to restart them. Thus, it may be simpler to just continue with the full sequence and reboot. U remoUnt ro: This attempts to remount all mounted filesystems read- only, thus flushing the journal in most cases, etc. If this works, the system should come up without any filesystem errors. IF POSSIBLE, WAIT UNTIL THE SYSTEM FINISHES WRITING! If the screen is in a state where you can see it (it may well not be if the crash happened in X), you should get an OK, Done or similar from the kernel. Once you've done this, even tho you still have the kernel and init running with perhaps a login possible, there's not much you could do anyway, except perhaps run an fsck, and it's better not to do that with a half-crashed system, so you might as well push the final B. B reBoot: This causes the kernel to do an immediate reboot, without syncing or unmounting anything. That's why you do the SU combos first. As I said, the SUB part is the most important. R is only useful when the keyboard is in RAW mode as when in X, and the E and I won't do a lot anyway if the system's screwed up enough. However, the E in particular can occasionally cause an app to save its work, thus saving you effort recreating it. But the SUB sequence is vital in an emergency situation to help keep mounted filesystems as consistent as possible over the reboot. Of course, as I mentioned above, if the kernel thinks it's screwed up it doesn't trust itself to know where on the disk it's writing, it won't write anything anyway. And sometimes, it's screwed up enough it simply doesn't hear the Alt-SRQ sequences in the first place. But when it works, it can save you loads of time and headaches due to filesystems left in an inconsistent state. With the basics taken care of, there's one more Alt-SRQ combo that is often useful to non-kernel-hackers (plus several others mostly of interest to kernel hackers). K saK Secure Access Key: While this isn't totally secure (see the note in the documentation mentioned above if you are thinking of using it for security purposes), this kills all programs on the current virtual console, thus allowing init to respawn the login or whatever. If it's just X that has crashed, you may be able to get back to a text mode login by using the R, S (as I said above, use sync here to at least have that done if the system quits responding entirely, which it sometimes does), K, and normal Ctrl-Alt-Fx keys. It's possible the text mode display will still be initially scrambled as well, however, so while you may have a login after that, you may not actually see it and need to type blind for a bit. What you can then try typing, whether or not you can see it, is your username and password to login, followed by the "reset" command (with a carriage return at the end, of course). If it's responding, the reset should then clear any remaining craziness on the display and you should then get a normal prompt and be back in business. Regardless of what the problem is (as VAH suggested, it's probably X related as X is about the only userspace program that has enough privileges to crash the entire system), hopefully this will at least allow you to keep the system from scrambling itself further thru multiple crashes, while you investigate things. I know Magic-SRQ has saved me a LOT of headaches, here, tho as I mentioned the kernel won't do it if it thinks it's screwed up enough it might simply scribble nonsense on the disk thereby screwing things up worse rather than fixing them, so it doesn't /always/ work. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman
