Re: Fedora 32 System-Wide Change proposal (late): Enable EarlyOOM

Chris Murphy Sat, 04 Jan 2020 15:18:06 -0800

On Sat, Jan 4, 2020 at 2:30 PM drago01 <drag...@gmail.com> wrote:
>
> On Sat, Jan 4, 2020 at 7:32 PM Chris Murphy <li...@colorremedies.com> wrote:
> >
> > It might be. And it might need to be tweaked. Perhaps 6% for SIGTERM
> > and 3% for SIGKILL. Or even 5% and 2.5%. For sure using a percentage
> > of RAM and swap is too simplistic. But it's easy for users to
> > understand. Something more sophisticated, based on kernel pressure
> > stall information would likely be better, and folks are working on
> > that.
>
> Yes that would be a way better metric than a percent value which is
> either to close to full ram or to early if you have lots of ram.
> 6% of 4GB is 254MB while for 32GB its almost 2GB - killing processes
> while you have 2GB left is just wasteful.


If there's a swap device, that won't happen. The case where SIGTERM
really happens at 10% RAM free, is when there's no swap device. And
even though the no swap device configuration is not a default, and
explicitly not recommended, right now, by the installer (as in, if you
try to do such an installation, it warns you) - it is a configuration
we allow, and I happen to know it's somewhat common among developers
with systems with lots of RAM expressly because swap thrashing even to
SSD results in such poor UX.

Consider the following 'vmstat 10' while doing a compile:

procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st

 6 11 4168060 1821580     40 736604 30234 10841 46533 13805 19230
29799 74 12  1 13  0

At this time, the GUI was completely unresponsive, not even the mouse
arrow moves, for about 1 minute. Seemingly plenty of RAM and swap, and
idle CPU. But rather heavy swap in and out.


10  9 4459648 200912     40 569260 11218 18856 28846 19997 15164 35256
28  9  9 53  0
 6  8 4207328 807092     40 636156 26205 16744 35472 18287 20179 34087
62 12  3 23  0

At these two lines, the mouse arrow is stuttering, the GUI is very
sluggish, even unresponsive much of the time.

Jan 04 15:37:18 fmac.local earlyoom[4896]: mem avail:  1212 of  7865
MiB (15 %), swap free: 4807 of 8195 MiB (58 %)

Near the same time. The system is no where near either RAM or swap
exhaustion. But swap si/so are high. This is an SSD BTW.

Can I get to the compile and force quit? Eventually, it would take a
couple minutes.  But good progress is being made with the compile
during this whole time.

earlyoom doesn't SIGTERM this compile until 20 minutes of this
behavior. With default settings. So it really isn't solving the
sluggish, stuttering problem. But what does happen, is it SIGTERMs the
compile before the system gets to a state where essentially all of the
work is only swap in and swap out, and no other work is being done.

Here is the output (2 week expiration)
https://pastebin.com/0iZHNjg7

Retest with no swap at all, and yes, compile gets a SIGTERM when free
memory gets to 10% (because swap is already considered to be 0% free,
since it doesn't exist). But also? The system isn't under any swap io
duress. The system is completely responsive throughout.

This is why we see developers giving up on swap partitions entirely.
swap-on-ZRAM might be a compromise. That's related issue #120.


> > That's not a fix either, it's a work around that papers over the
> > problem. Same as earlyoom, except RAM costs money, and may not be an
> > option due to hardware limitations. A modern operating system needs to
> > know better than to allow unprivileged processes to take down the
> > whole system.
>
> I think you misunderstood me. Yes the OS should behave better than
> this but if you are running a server you don't want your DB, web
> server to not be reachable because the system run out of memory - the
> only way to "fix" that
> is to provide enough resources. No amount of OOM killing would help
> you here. The system may be up but not the server process the machine
> is running for ...

Perhaps, but two points:

a. this feature is for Workstation. If the Server working group wants
to give it a go, that's up to them. But they may prefer experimenting
with more server oriented user space oom daemons like recent versions
of oomd. And for that use case, Facebook (and others) have
investigated this and find that avoiding OOM even by process killing,
is far less bad than the system hanging itself. As in better for
recovery and better for limited sysadmin resources. There's a video
about it from the recent All Systems Go conference.

b. earlyoom does SIGTERM first, I have yet to see a single process
(hundreds of tests, but that's really nothing, and also not a
scientific sample) that doesn't respond to SIGTERM, where SIGKILL is
needed.


> > > And btw we should really update the minimum memory requirements in our 
> > > documentation, the current ones have nothing to do with reality (if you 
> > > want a pleasant user experience).
> >
> > Can you be more specific?
> >
> > On getfedora.org it reads:
> > Fedora requires a minimum of 20GB disk, 2GB RAM, to install and run
> > successfully. Double those amounts is recommended.
>
>
> I simply do not think 2GB is sufficient, the "recommended double" i.e
> 4GB should be the "required" and drop the double part all together.
> A modern desktop with apps on top will not run well enough on 2GB,
> lets stop pretending it does. But anyways that's off topic as it is
> not part of the proposal.

Workstation working group recently bumped this from 1G minimum, 2G
recommended. We're considering VM's with these numbers. And
comparative point of reference, Windows 10 64-bit is also 2G minimum.


-- 
Chris Murphy
_______________________________________________
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org

Re: Fedora 32 System-Wide Change proposal (late): Enable EarlyOOM

Reply via email to