Re: Lock-ups with 2.4.19 kernel

Ray Olszewski Tue, 27 Aug 2002 21:24:23 -0700

At 01:28 PM 8/28/02 +0930, Adam Luchjenbroers wrote:
> >          1. Run top on a remote console (one the telnets or ssh's in). When
> > the system hangs, the console will still display the last "top" output, so
> > you can see (a) what CPU usage was, (b) how much memory had been used, and
> > (c) what the top processes were. That may give you a hint about what is
> > going on.
>
>Is there any way I can get that outputted to somewhere else? Doesn't look
>  like it would go well in a log file.


I don't quite understand what you are asking. If you are running "top" via 
an ssh login, that *is* "somewhere else" -- an terminal application running 
on a different machine. You can be running from, say, an xterm on a second 
Linux host or Putty on a Windows host. In both cases, you have the 
terminal's scrollback capability and the ability to cut-and-paste, even 
after the machine you are testing dies.

An alternative, I suppose, is to write a program that runs on the test 
machines and regularly copies the underlying  stats (the raw data in 
/proc/stat and /proc/meminfo) to a file. But with a crash that bad, you'd 
lose the last data since sync'ing would fail, making that an ugly solution.


>It does seem to happen during processor intensive situations (eg. large
>battles in Kohan), what programs can I use to test this (memory testers would
>also be useful).

The best way to simulate high CPU usage is with a program that uses the CPU 
a lot. Some games, really big FTP transfers, and kernel compiles are the 
usual candidates. The procedure I've already described is the best test I 
know of.

The best memory tester I know of is memtest86. This isn't really a Linux 
program; it's a small binary that runs directly from LILO so can check all 
but about 64 K or RAM. There is a memtest Linux app too, but I haven't used 
it. We Debian users gets them in .deb packages (memtest and sysutils), and 
I imagine there are RPMs with similar names around too.

> >          2. If you can, use another remote console to cause the crash, but
> > leave the system with a vt (not X) on its console. A crash message (a
> > kernel oops, for example) may show on the screen when the crash happens,
> > and that will give you at least a little info.
>
>Useful, I'll probably write down anything that comes up.
>
> > Having said all of that ... when I ecperienced similar crashes (or what
> > might be similar crashes; hard to be sure from a sketchy description), they
> > were caused not by Linux, but by hardware problems. In one case, inadequate
> > CPU cooling (too small a heatsink-fan combo) caused to the CPU (a P3) to
> > shut down for its own safety. In the other, the power supply had some sort
> > of problem that only showed up under high loads. In both casesd, replacing
> > the offending hardware eliminated the problem.
>
>Damn, software can be fixed, hardware costs money. Is there any form of
>temporary workarounds I could employ (eg. put a cap on CPU usage at about
>80%?) while I save up to replace the hardware.

I don't know of any app that does what you describe. Anyway, if it really 
is a hardware problem, the failure is likely to be probabilistic, not 
deterministic, and is only much more likely to occur with high CPU usage 
than, say, 80%. Since even pricey heatsinks cost about $US20 and power 
supplies not much more, I personally was pleased to discover that I could 
fix these problems cheaply and stop troubleshooting. Of course, your 
situation may well be different.


--
-------------------------------------------"Never tell me the odds!"--------
Ray Olszewski                                   -- Han Solo
Palo Alto, California, USA                        [EMAIL PROTECTED]
-------------------------------------------------------------------------------

-
To unsubscribe from this list: send the line "unsubscribe linux-newbie" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.linux-learn.org/faqs

Re: Lock-ups with 2.4.19 kernel

Reply via email to