Daiajo Tibdixious <[email protected]> posted
[email protected], excerpted
below, on  Mon, 16 Feb 2009 13:05:14 +1100:

> During the freeze I can still move the cursor around but not select
> anything & the keyboard does nothing.

Here's something that should at least help keep your filesystems in a 
semi-usable state, and it /may/ help you continue without a reboot, 
depending on what is wrong.

First, make sure you have the Magic SysRq option turned on in the 
kernel.  In menuconfig, that's Kernel hacking > Magic SysRq key, 
MAGIC_SYSRQ is the config option key.  

If you check the KConfig help text for it it lists a document 
(Documentation/sysrq.txt) you can check for more info but with that on in 
your kernel here's the basics for emergency recovery.  Memorize the 
following sequence, the last three being most critical:

REISUB

Then, when you have a lockup, hit Alt-SysRq-<letters>.  Unless the kernel 
is so screwed up it's afraid to write anything to disk for fear it'll 
write where it shouldn't or isn't responding at all, this will at least 
keep your filesystems at least somewhat in order.  Here's what each 
letter does:

R Raw:  Turn off keyboard RAW mode.  If you are working at the text 
console this one likely isn't necessary, but from X, it will be since 
otherwise it may not see other keys properly.

E tErm: Send a SIGTERM (15) to all apps (except init).  This tells them 
to save their files and shut down.  This is a strong hint that the app 
should terminate but it can ignore it if it wants (as an app with an 
unsaved document might, tho it would popup a save dialog, but of course 
you likely won't be able to see it).

If there's disk activity, you may at this point wish to wait until it 
stops.

I kIll: Send a SIGKILL (9) to all apps (except init).  This is the 
unblockable kill signal.  It should kill all normal apps, unless they are 
stuck in uninterruptable sleep (as can be the case if they are waiting 
for disk i/o, but aren't getting it for some reason).

S Sync: This causes the kernel to attempt to sync all mounted 
filesystems, flushing all buffers, etc.

Alt-SysRq-S can be used at other times you want to make sure everything's 
synced, as well.  If there's an operation that locked the computer 
before, this is a good sequence to use before you try it again.  At least 
the buffers should be synced if it locks up again.

Again, if possible, wait until any disk activity stops.  Up to this 
point, the system is still generally operational.  The EI operations 
should have killed all userspace apps but init, and if you are lucky and 
the system wasn't too hosed, init may well have popped a login back up.  
If so, you can login and continue if desired, but note that system 
services will have been killed as well, so you'd need to restart them.  
Thus, it may be simpler to just continue with the full sequence and 
reboot.

U remoUnt ro:  This attempts to remount all mounted filesystems read-
only, thus flushing the journal in most cases, etc.  If this works, the 
system should come up without any filesystem errors.

IF POSSIBLE, WAIT UNTIL THE SYSTEM FINISHES WRITING!  If the screen is in 
a state where you can see it (it may well not be if the crash happened in 
X), you should get an OK, Done or similar from the kernel.

Once you've done this, even tho you still have the kernel and init 
running with perhaps a login possible, there's not much you could do 
anyway, except perhaps run an fsck, and it's better not to do that with a 
half-crashed system, so you might as well push the final B.

B reBoot:  This causes the kernel to do an immediate reboot, without 
syncing or unmounting anything.  That's why you do the SU combos first.

As I said, the SUB part is the most important.  R is only useful when the 
keyboard is in RAW mode as when in X, and the E and I won't do a lot 
anyway if the system's screwed up enough.  However, the E in particular 
can occasionally cause an app to save its work, thus saving you effort 
recreating it.  But the SUB sequence is vital in an emergency situation 
to help keep mounted filesystems as consistent as possible over the 
reboot.

Of course, as I mentioned above, if the kernel thinks it's screwed up it 
doesn't trust itself to know where on the disk it's writing, it won't 
write anything anyway.  And sometimes, it's screwed up enough it simply 
doesn't hear the Alt-SRQ sequences in the first place.  But when it 
works, it can save you loads of time and headaches due to filesystems 
left in an inconsistent state.

With the basics taken care of, there's one more Alt-SRQ combo that is 
often useful to non-kernel-hackers (plus several others mostly of 
interest to kernel hackers).

K saK Secure Access Key:  While this isn't totally secure (see the note 
in the documentation mentioned above if you are thinking of using it for 
security purposes), this kills all programs on the current virtual 
console, thus allowing init to respawn the login or whatever.  If it's 
just X that has crashed, you may be able to get back to a text mode login 
by using the R, S (as I said above, use sync here to at least have that 
done if the system quits responding entirely, which it sometimes does), 
K, and normal Ctrl-Alt-Fx keys.  It's possible the text mode display will 
still be initially scrambled as well, however, so while you may have a 
login after that, you may not actually see it and need to type blind for 
a bit.  What you can then try typing, whether or not you can see it, is 
your username and password to login, followed by the "reset" command 
(with a carriage return at the end, of course).  If it's responding, the 
reset should then clear any remaining craziness on the display and you 
should then get a normal prompt and be back in business.

Regardless of what the problem is (as VAH suggested, it's probably X 
related as X is about the only userspace program that has enough 
privileges to crash the entire system), hopefully this will at least 
allow you to keep the system from scrambling itself further thru multiple 
crashes, while you investigate things.  I know Magic-SRQ has saved me a 
LOT of headaches, here, tho as I mentioned the kernel won't do it if it 
thinks it's screwed up enough it might simply scribble nonsense on the 
disk thereby screwing things up worse rather than fixing them, so it 
doesn't /always/ work.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


Reply via email to