[gentoo-amd64] Re: oom killer problems

Duncan Thu, 29 Sep 2005 00:17:27 -0700

Hemmann, Volker Armin posted
<[EMAIL PROTECTED]>, excerpted
below,  on Wed, 28 Sep 2005 22:35:32 +0200:


> Hi,
> when I try to emerge kdepim-3.4.2 with the kdeenablefinal use-flag I get
> a lot of oom-kills.
> I got them with 512mb, so I upgraded to 1gig and still have them. What
> puzzles me is, that I have a lot of swap free when it happens.. could
> someone please tell me, why the oom-killer becomes active, when there is
> still a lot of free swap?
> I am just an user, so using easy words would be much appreciated ;)
> 
[snip]
> 
> kernel is 2.6.13-r2
> I have 1gb of ram, and approximatly 1gb of swap.
> 
>  I  emerged kdepim without kdeenablefinal, so there is no big pressure,
>  I am
> just curious

There's something about the "lots of swap left" thing below.  However,
that's theory, I'll cover the practical stuff first, leaving that aspect
for later.

kdeenablefinal requires HUGE amounts of memory, no doubt about it.  I've
not had serious issues with my gig of memory (dual Opterons as you seem to
have), using kdeenablefinal here, but I've been doing things rather
different than you probably have, and any one of the things I've done
different may be the reason I haven't had the memory issue to the severity
you have.

1.  I have swap entirely disabled.

Here was my reasosning (apart from the issue at hand).  I was reading an
explanation of some of the aspects of the kernel VMM (virtual memory
manager) on LWN (Linux Weekly News, lwn.net), when I suddenly realized
that all the complexity they were describing I could probably do without,
by turning off swap, since I'd recently upgraded to a gig of RAM.  I
reasoned that I normally ran a quarter to a third of that in application
memory, so even if I doubled normal use at times, I'd still have a third
of a gig of free memory available for cache.  Further, I reasoned that if
something should use all that memory and STILL run out, it was likely a
runaway process, gobbling all the memory available, and that I might as
well have it activate the OOM killer at a gig, without further lugging the
system down, than at  2 G (or whatever), lugging the system down with a
swap storm so I couldn't do anything about it anyway.  For the most part,
I've been quite happy with my decision, altho now that suspend is starting
to look like it'll work for dual CPU systems (suspend to RAM sort of
worked, for the first time here, early in the .13 rcs, but they reverted
it for .13 release, as it needed more work), I may enable swap again, if
only to get suspend to disk functionality.

Of course, I'm not saying disabling swap is the right thing for you, but
I've been happy with it, here.  Anyway, a gig of RAM, swap disabled, so
the VMM complexity that's part of managing swap also disabled.  It's
possible that's a factor, tho I'm guessing the stuff below is more likely.

2.  Possibly the biggest factor is the KDE packages used.  I'm using the
split-ebuilds, NOT the monolithic category packages.  It's possible that's
the difference.  Further, I don't have all the split-packages that compose
kdepim-meta merged.  I have kmail and knode merged, with dependencies of
course, but don't have a handheld to worry about syncing to, so skipped
all those split-ebuilds that form part of kdepim-meta (and are part of the
monolithic ebuild), except where kmail/knode etc had them as dependencies.
Thus, no kitchensync, korn, kandy, kdepim-kresources, etc.

There are therefore two possibilities here.  One is that one of the
individual apps I skipped requires more memory.  The other is that the
monolithic ebuild you used does several things at once (possibly due to
your jobs setting, see below) where the split ebuilds do them in series,
therefore limiting the maximum memory required at a given moment.

3.  I'm NOT using unsermake.  For some reason, it hasn't worked for me
since KDE 3.2 or so.  I've tried different versions, but always had either
an error, or despite my settings, the ebuild doesn't seem to register
unsermake and thus uses the normal make system.  Unsermake is better at
parallellizing the various jobs, making more efficient use of multiple
CPUs, but also, given the memory required for enable final, likely causing
higher memory stress than ordinary gnu-make does.  If you are using that
and it's otherwise working for you, that may be the difference.

The rest of the possibilities may or may not apply.  You didn't include
the output of emerge info, so I can't  compare the relevant  info from
your system to mine.  However, I suspect they /do/ apply, for reasons
which should be clear as I present them, below.

4.  It appears (from the snipped stuff) you are running dual CPU (or a
single dual-core CPU).  How many jobs do you have portage configured for?
With my dual-CPU system, I originally had four set, but after seeing what
KDE compiling with kdeenablefinal did to my memory resources, even a gig,
I decided I better reduce that to three!  If you have  four or more
parallel jobs set, THAT could very possibly be your problem, right there.
You can probably do four or more jobs OR kdeenablefinal, but not BOTH, at
least not BOTH, while running X and KDE at the same time!

I should mention that I sometimes run multiple emerges (each with three
jobs) in parallel.  I *DID* run into OOM issues when trying to do that
with kmail and another large KDE package.  Kmail is of course part of
kdepim, and my experience DOES confirm that it's one of the largest in
memory requirements, with kdeenablefinal set.  I could emerge small things
in parallel with it, stuff like kworldwatch, say, but nothing major, like
konqueror.  Thus, I can almost certainly say that six jobs will trigger
the OOM killer, when some of them are kmail, and could speculate that five
jobs would do it, at some point in the kmail compilation.  Four jobs may
or may not work, but three did, for me, under the conditions explained in
the other six points, of course.

(Note that the unsermake thing could compound the issue here, because as I
said, it's better at finding things to run in parallel than the normal
make system is.)

5.  I'm now running gcc-4.0.1, and have been compiling kde with
gcc-4.0.0-preX or later since kde-3.4.0.  gcc-4.x is still package.mask-ed
on Gentoo, because some packages still don't compile with it.  Of course,
that's easily worked around because Gentoo slots gcc, so I have the latest
gcc-3.4.x installed, in addition to gcc-4.x, and can (and do) easily
switch between them using gcc-config.  However, the fact that gcc-4 is
still masked for Gentoo, means you probably aren't running it, while I am,
and that's what I compile kde with.  The 4.x version is enough different
from 3.4.x that memory use can be expected to be rather different as well.
It's quite possible that the kdeenablefinal stuff requires even more
memory with gcc-3.x than it does with the 4.x I've been successfully
using.

6.  It's also possible something else in the configuration affects
compile-time memory usage.  There are CFLAGS, of course, and I'm also
running newer (and still masked, AFAIK) versions of binutils and glibc,
with patches specifically for gcc-4.

7.  I don't do my kernels thru Gentoo, preferring instead to use the
kernel straight off of kernel.org, You say kernel 2.6.13-r2, the r2
indicating a Gentoo revision, but you don't say /which/ Gentoo kernel you
are running.  The VMM is complex enough and has a wide enough variety of
patches circulating for it, that it's possible you hit a bug that wasn't
in the mainline kernel.org kernel that I'm running.  Or... it may be some
other factor in our differing kernel configs.

...

Now to the theory.  Why would OOM trigger when you had all that free swap?
There are two possible explanations I am aware of and maybe others that
I'm not.

1.  "Memory allocation" is a verb as well as a noun.

We know that enablefinal uses lots of memory.  The USE flag description
mentions that and we've discovered it to be /very/ true.  If you run
ksysguard on your panel as I do, and monitor memory using it as I do (or
run a VT with a top session running if compiling at the text console), you
are also aware that memory use during compile sessions, particularly KDE
compile sessions with enablefinal set, varies VERY drastically!  From my
observations, each "job" will at times eat more and more memory, until
with kmail in particular, multiple jobs are taking well over 200MB of
memory a piece!  (See why I mentioned parallel jobs above?  At 200,
possibly 300+ MB apiece, multiple parallel jobs eat up the memory VERY
fast!)  After grabbing more and more memory for awhile, a job will
suddenly complete and release it ALL at once.  The memory usage graph will
suddenly drop multiple hundreds of megabytes -- for ONE job!

Well, during the memory usage increase phase, each job will allocate more
and more memory, a chunk at a time.  It's possible (tho not likely from my
observations of this particular usage pattern) that an app could want X MB
of memory all at once, in ordered to complete the task.  Until it gets
that memory it can't go any further, the task it is trying to do is half
complete so it can't release any memory either, without losing what it has
already done.  If the allocation request is big enough, (or you have
several of them in parallel all at the same time that together are big
enough), it can cause the OOM to trigger even with what looks like quite a
bit of free memory left, because all available cache and other memory that
can be freed has already been freed, and no app can continue to the point
of being able to release memory, without grabbing some memory first.  If
one of them is wanting a LOT of memory, and the OOM killer isn't killing
it off first (there are various OOM killer algorithms out there, some
using different factors for picking the app to die than others), stuff
will start dieing to allow the app wanting all that memory to get it.

Of course, it could also be very plainly a screwed up VMM or OOM killer,
as well.  These things aren't exactly simple to get right... and if gcc
took an unexpected optimization that has side effects...

2.  There is memory and there is "memory", and then there is 'memory' and
"'memory'" and '"memory"' as well.  <g>

There is of course the obvious difference between real/physical and
swap/virtual memory, with real memory being far faster (while at the same
time being slower than L2 cache, which is slower than L1 cache, which is
slower than the registers, which can be accessed at full CPU speed, but
that's beside the point for this discussion).

That's only the tip of the iceberg, however.  From the software's
perspective, that division mainly affects locked memory vs swappable
memory.  The kernel is always locked memory -- it cannot be swapped, even
drivers that are never used, the reason it makes sense to keep your kernel
as small as possible, leaving more room in real memory for programs to
use.  Depending on your kernel and its configuration, various forms of
RAMDISK, ramfs vs tmpfs vs ... may be locked (or not).  Likewise, some
kernel patches and configs make it easier or harder for applications to
lock memory as well.  Maybe a complicating factor here is that you had a
lot of locked memory and the compile process required more locked memory
than was left?  I'm not sure how much locked memory a normal process on a
normal kernel can have, if any, but given both that and the fact that the
kernel you were running is unknown, it's a possibility.

Then there are the "memory zones".  Fortunately, amd64 is less complicated
in this respect than x86.  However, various memory zones do still exist,
and not only do some things require memory in a specific zone, but it can
be difficult to transfer in-use memory from one zone to another, even
where it COULD be placed in a different  zone.  Up until earlier this
year, it was often impossible to transfer memory between zones without
using the backing store (swap).  That was the /only/ way possible!
However, as I said, amd64 is less complicated in this respect than x86, so
memory zones weren't likely the issue here -- unless something was going
wrong, of of course.

Finally, there's the "contiguous memory" issue.  Right after boot, your
system has lots of free memory, in large blobs of contiguous pages.  It's
easy to get contiguous memory allocated in blocks of 256, 512, and 1024
pages at once.  As uptime increases, however, memory gets fragmented thru
normal use.  A system that has been up awhile will have far fewer 1024
page blocks immediately available for use, and fewer 512 and 256 page
blocks as well. Total memory available may be the same, but if it's all in
1 and 2 page blocks, it'll take some serious time to move stuff around to
allocate a 1024 page contiguous block -- if it's even possible to do at
all.  Given the type of memory access patterns I've observed during kde
merges with enablefinal on, while I'm not technically skilled enough to
verify my suspicions, of the listed possibilities which are those I know,
I believe this to be the most likely culprit, the reason the OOM killer
was activating even while swap (and possibly even main memory) was still
free.

I'm sure there are other variations on the theme, however, other memory
type restrictions, and it may have been one of /those/ that it just so
happened came up short at the time you needed it.  In any case, as should
be quite plain by now, a raw "available memory" number doesn't give
/anything/ /even/ /close/ to the entire picture, at the detail needed to
fully grok why the OOM killer was activating, when overall memory wasn't
apparently in short supply at all.

I should also mention those numbers I snipped.  I know enough to just
begin to make a bit of sense out of them, but not enough to /understand/
them, at least to the point of understanding what they are saying is
wrong.  You can see the contiguous memory block figures  for each of the
DMA and normal memory zones.  4kB pages, so the 1024 page blocks are 4MB. 
I just don't understand enough about the internals to grok either them or
this log snip, however.  I know the general theories and hopefully
explained them well enough, but don't know how they apply concretely. 
Perhaps someone else does.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman in
http://www.linuxdevcenter.com/pub/a/linux/2004/12/22/rms_interview.html


-- 
[email protected] mailing list

[gentoo-amd64] Re: oom killer problems

Reply via email to