-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

>>>>> "Sandip" == Sandip Bhattacharya <[EMAIL PROTECTED]> writes:

    Sandip> I got a weird locking up of the machine with the HDD light
    Sandip> lit on continuously, while writing an audio cd. After a
    Sandip> few frustating moments where I tried to go to console
    Sandip> mode/kill X/etc., which didnt work BTW, I had to shrug and
    Sandip> press the reset button.

    Sandip> I looked at the logs, and it gave me uncomprehensible
    Sandip> messages like: ==================================== ct 20
    Sandip> 01:43:00 pluto kernel: oom-killer: gfp_mask=0x1d2 Oct 20
    Sandip> 01:43:24 pluto kernel: DMA per-cpu: Oct 20 01:43:56 pluto
    Sandip> kernel: cpu 0 hot: low 2, high 6, batch 1 Oct 20 01:44:27
    Sandip> pluto kernel: cpu 0 cold: low 0, high 2, batch 1 Oct 20
    Sandip> 01:44:34 pluto kernel: Normal per-cpu: Oct 20 01:44:36
    Sandip> pluto kernel: cpu 0 hot: low 32, high 96, batch 16 Oct 20
    Sandip> 01:44:36 pluto kernel: cpu 0 cold: low 0, high 32, batch
    Sandip> 16 Oct 20 01:44:36 pluto kernel: HighMem per-cpu: empty
    Sandip> ====================================

    Sandip> Googled for oom-killer, and found out that it is a new
    Sandip> fangled addition to kernel 2.6 called Out Of Memory
    Sandip> killer[1]. When the computer for some reason runs out of
    Sandip> RAM *and* swap, it tries to kill the largest resource
    Sandip> hog. Apparently, this oom-killer sometimes runs amok
    Sandip> killing everything in sight.

Tell me about it.  A certain client with 200 servers is facing exactly
this, and I've been breaking my back trying to fix it.  From what I've
discovered:

The oom-killer isn't exactly a rogue process.  There are two factors
that lead to your system becoming unusable (note: it's still alive, as
a ping would show):

1. 2.6 kernels have an option for overcommiting memory.  That is, when
a process does a malloc the kernel satisfies it without checking
whether that much physical RAM+swap is present or not.  This usually
works since typical processes tend to go for the ``give me lots of RAM
and I'll figure out what to do with it later'' model of memory
allocation.  Unfortunately, this sometimes screws up when the process
actually starts using that RAM and the kernel discovers that it is
over extended.

2. There /seems/ to be a bug in kernels pre-2.6.9 (BTW, 2.6.9 is out
as of today) that causes kernel memory leaks.  I didn't find any
definitive stuff on this, but the consensus seems to be that
overcommiting + the bug tends to hit highly active servers.

Solutions:

1. Switch to 2.6.9 and see if the problem persists.

2. Change the overcommit behaviour of your system.  Look at
/usr/src/linux/Documentation/vm/overcommit-accounting and echo 0, 1 or
2 into /proc/sys/vm/overcommit_memory (or use sysctl), depending on
what your needs are.

3. (1) and (2).

You could also try tuning overcommit_ratio and swappiness (both again
in /proc/sys/vm) to meet your specific needs.  I run with the default
swappiness of 60, but Andrew Morton says that 100 is also fine.

    Sandip> Andrew Morton on the lkml list said[2] that it was an
    Sandip> untraceable problem occuring when you burn audio CDs.

    Sandip> Debian seems to have released[3] a version of the kernel
    Sandip> which fixes the problem.

    Sandip> The problem still exists in current FC2
    Sandip> kernel(2.6.8-1.521). It has been reported[4] but no fix
    Sandip> yet.

    Sandip>  From all the other links, it appears that the problem is
    Sandip> not limited to audio CD writing.

Rebooting the system with 2.6.9 right now.  Wish me luck ;)

Regards,

- -- Raju
- -- 
Raj Mathur                [EMAIL PROTECTED]      http://kandalaya.org/
       GPG: 78D4 FC67 367F 40E2 0DD5  0FEF C968 D0EF CC68 D17F
                      It is the mind that moves
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)
Comment: Processed by Mailcrypt 3.5.8 <http://mailcrypt.sourceforge.net/>

iD8DBQFBdf/gyWjQ78xo0X8RAjlQAJ9U+AKLQb3LT9VOxqjyKrbYM8pjmQCdG5bc
NfU3iMj3ekOqkS3PiNrd8Do=
=cD7/
-----END PGP SIGNATURE-----

_______________________________________________
ilugd mailinglist -- [EMAIL PROTECTED]
http://frodo.hserus.net/mailman/listinfo/ilugd
Archives at: http://news.gmane.org/gmane.user-groups.linux.delhi 
http://www.mail-archive.com/[EMAIL PROTECTED]/

Reply via email to