Well thanks for the hints. I have a couple of responses below.
Garl
Bob Miller wrote:
> Garl R. Grigsby wrote:
<SNIP>
me rambling
<SNIP>
>
> Short answer: I don't have a clue. (-:
>
> Long answer: Here are some ideas...
>
> 1. Some app is misconfigured and is constantly restarting/dumping
> core, and that's keeping your disk busy. This happened on one
> machine I put 7.1 on -- the problem was an interaction between
> Apache and mod_ssl. I didn't figure it out; I just removed Apache,
> since I didn't want a web server there anyway.
>
> Have you checked syslog and root's mailbox?
Y4up and no joy.
>
>
> 2. A kernel or driver bug is leaking memory. If this is the case,
> your machine will probably eventually hang.
>
> Have you checked dmesg?
Yes. There are two errors that are each repeated a couple of time, but I have not
taken the time to hunt them down. I do not beleive they are related, but I am the
first to admit I spend a lot of time being wrong. Here they are for those who are
interested (NOTE: these come at the very bottom of dmesg):
> scsi0: MEDIUM ERROR on channel 0, id 0, lun 0, CDB: Read (10) 00 00 02 4d 1f 00 00
>38 00
> Info fld=0x24d2f, Current sd08:01: sense key Medium Error
> Additional sense indicates Unrecovered read error
> scsidisk I/O error: dev 08:01, sector 150768
> scsi0: MEDIUM ERROR on channel 0, id 0, lun 0, CDB: Read (10) 00 00 02 5a 57 00 00
>10 00
> Info fld=0x25a63, Current sd08:01: sense key Medium Error
> Additional sense indicates Unrecovered read error
> scsidisk I/O error: dev 08:01, sector 154144
> scsi0: MEDIUM ERROR on channel 0, id 0, lun 0, CDB: Read (10) 00 00 02 5a 5f 00 00
>08 00
> Info fld=0x25a63, Current sd08:01: sense key Medium Error
> Additional sense indicates Unrecovered read error
> scsidisk I/O error: dev 08:01, sector 154144
> scsi0: MEDIUM ERROR on channel 0, id 0, lun 0, CDB: Read (10) 00 00 02 5a 5f 00 00
>08 00
> Info fld=0x25a63, Current sd08:01: sense key Medium Error
> Additional sense indicates Unrecovered read error
> scsidisk I/O error: dev 08:01, sector 154144
> ISO 9660 Extensions: Microsoft Joliet Level 3
> ISO 9660 Extensions: RRIP_1991A
> end_request: I/O error, dev 02:00 (floppy), sector 0
> scsi0: MEDIUM ERROR on channel 0, id 0, lun 0, CDB: Read (10) 00 00 02 5a 5f 00 00
>08 00
> Info fld=0x25a63, Current sd08:01: sense key Medium Error
> Additional sense indicates Unrecovered read error
> scsidisk I/O error: dev 08:01, sector 154144
> end_request: I/O error, dev 02:00 (floppy), sector 0
> end_request: I/O error, dev 02:00 (floppy), sector 0
> end_request: I/O error, dev 02:00 (floppy), sector 0
>
> 3. A userland app is leaking memory. Top would show this -- start top
> and type 'M' (uppercase) to sort by memory use. Type 'i'
> (lowercase) if you don't see a whole screenful of processes.
>
> The kde tools are known to leak, but not at the rate you're seeing.
> I generally have to restart kfm every couple of weeks.
X is not running. I typically run from init level 3. I only run X if my HPUX box is
down and I need to get something done quickly. Other wise I run everything from a
rlogin or telnet session.
> 4. Nothing is wrong; you're misreading the tea leaves.
Maybe I should start drinking tea again. Then I could figure this out. Hmmm........
> You say top
> shows all memory in use? You aren't basing that on the 4th line
> that shows "Mem: ######K av, ######K used", I hope. Those two
> numbers are always nearly equal. The Linux kernel caches
> aggressively -- it does its best to keep memory full so it won't
> have to read from disk.
I am basing this on /proc/meminfo. When I look at the MemFree line is will show
something like 6k free. Looking at the swap free line I ussually see something like
50% of my swap used (256mb total).
> To see the paging rate, try vmstat. Here's some sample vmstat
> output under heavy vm load.
I will have to do somereading on vmstat. I have never used this command before.
Thanks for the hint.
>
> jogger-egg> vmstat 5
> procs memory swap io system cpu
> r b w swpd free buff cache si so bi bo in cs us sy id
> 0 0 0 43316 3476 2896 46388 0 0 3 1 1 18 1 0 6
> 0 0 0 43316 3476 2896 46388 0 0 0 11 127 124 0 0 99
> 1 0 0 43316 1684 2132 29360 0 0 0 0 106 133 0 3 96
> 2 0 0 56024 1548 220 15860 0 2542 1 636 5188 140 3 93 4
> 2 0 0 69540 1620 220 16012 0 2703 0 676 5509 124 5 95 0
> 2 0 1 81888 1024 220 16004 0 2470 0 617 5042 111 4 96 0
> 1 0 0 82288 76864 220 15168 0 1005 0 251 2112 119 2 33 65
> 0 0 0 82288 76864 220 15168 0 0 0 0 101 112 0 1 99
> ^C
>
> The interesting colmns are si and so (amount swapped in and out, in
> Kb) and bi/bo (amount of block I/O in Kb). Also note how cache
> fell from 46 Mb to 15 Mb as memory got tight.
>
> The run you see is from a single PII/450MHz, 192 Mb RAM, and single
> IDE disk. I ran a simple memory thrashing program for 20 seconds.
> If I'd run it longer, you would have seen si nonzero.
>
> > So now to my question(s).
> >
> > 1) How do I find out what app is tying up all of my memory? Top does not
> > list any one app as chewing up all the RAM.
>
> See above.
>
> > 2) Is there a way to force a memory cleanup? (IE force all of the apps
> > to "claim" there allocated memeory?)
>
> "kill -9 ..." (-:
This I am familar with. I spend a lot of time with my friend 'kill -9'. <grin>
> > 3) Is this a problem with Mandrake (v7.1)? (Or is this a PEBKAC
> > problem?)
>
> Unknown.
>
> > 4) Am I going crazy?
>
> Unknown. Ask the voices.
They have been less than helpful these days. I think I ticked them off. Maybe that
was a mista.......
> > 5) Or could this be a problem with the SMP? This is the first time I
> > have ever run Linux on a dual processor machine, so I have no idea if
> > this could be related (I really doubt it, but in my line of work I have
> > learned to question all things computer related)
>
> I doubt it too. (-:
>
> --
> K<bob>
> [EMAIL PROTECTED], http://www.jogger-egg.com/
--
=============================================================================
Garl R. Grigsby
Customer Applications Engineering - Analysis Team
-----------------------------------------------------------------------------
Structural Dynamics Research Corporation Phone: (800)242-7372
TAO Americas Support Center FAX: (541)342-8277
1750 Willow Creek Circle Email: [EMAIL PROTECTED]
Eugene, OR 97402 Internet: http://www.sdrc.com
=============================================================================
-FEA makes a good engineer great, and a poor engineer dangerous-
=============================================================================
PGP ID: 0xF2D845E7
PGP Fingerprint: 9C40 CB5E 1C51 CF58 E3F9 3F2C 8F1F F3EF F2D8 45E7
=============================================================================