Re: [systemd-devel] how to let systemd hibernate start/stop the swap area?

2023-04-03 Thread Michael Chapman
On Mon, 3 Apr 2023, Lennart Poettering wrote:
> On Sa, 01.04.23 06:16, Michael Chapman (m...@very.puzzling.org) wrote:
> 
> > > Well, in larger environments the goal is typically to saturate all
> > > hosts, but not overload them. i.e. maximizing your ROI. No need to
> > > fall from one extreme into the other. Today's Linux can actually
> > > achieve something like this, if you use it properly. Swap is part of
> > > using it "properly".
> > >
> > > Oversized hw is typically a bad investment. In particular in today's
> > > cloud world where costs multiply with every node you have.
> >
> > If customers have paid for RAM, you don't turn around and given them swap
> > instead. That's just plain dishonest.
> 
> This is nonsense. Your VM images are typically backed by disk, no? You
> just amplify IO on that.

No, you don't... because _exactly the same_ IO is done. Once the swap is 
full, the existence of swap doesn't change what gets paged in or out on 
the host side, and it doesn't change which parts of guest RAM gets paged 
in or out. (And I really _don't want_ guest RAM to be paged in or out... 
we had sold it as RAM!)

> Anyway, you apparently think you know MM better than the fb folks who
> wrote the stuff. Good for you then! Since it doesn't look likely that
> anyone can convince you otherwise, let's end this dicussion here.

I find it very upsetting that you assume I just made all of this up. I did 
measurements. The results showed that swap made no difference to guest 
performance. If it made the guests perform better I would have kept it!

But yes, if you don't believe me I think it is best that we leave it at 
that. I honestly can't think of any other way to convince you.


Re: [systemd-devel] how to let systemd hibernate start/stop the swap area?

2023-04-03 Thread Lennart Poettering
On Sa, 01.04.23 06:16, Michael Chapman (m...@very.puzzling.org) wrote:

> > Well, in larger environments the goal is typically to saturate all
> > hosts, but not overload them. i.e. maximizing your ROI. No need to
> > fall from one extreme into the other. Today's Linux can actually
> > achieve something like this, if you use it properly. Swap is part of
> > using it "properly".
> >
> > Oversized hw is typically a bad investment. In particular in today's
> > cloud world where costs multiply with every node you have.
>
> If customers have paid for RAM, you don't turn around and given them swap
> instead. That's just plain dishonest.

This is nonsense. Your VM images are typically backed by disk, no? You
just amplify IO on that.

Anyway, you apparently think you know MM better than the fb folks who
wrote the stuff. Good for you then! Since it doesn't look likely that
anyone can convince you otherwise, let's end this dicussion here.

Lennart

--
Lennart Poettering, Berlin


Re: [systemd-devel] how to let systemd hibernate start/stop the swap area?

2023-04-03 Thread Lennart Poettering
On Fr, 31.03.23 13:34, Christoph Anton Mitterer (cales...@scientia.org) wrote:

> Hey.
>
> Just for better understanding:
>
> AFAIU, the main idea of having swap despite enough memory was the
> following:
>
> Unless when processes explicitly release memory (or get stopped), the
> kernel can mostly reclaim only cached memory,... but if swap is
> available it can also reclaim anonymous memory.
>
> So the idea is, that processes might have pages that are literally
> never used (except for initial loading), yet still kept in memory... so
> these permanently eat up physical memory when they cannot be swapped
> out.
>
> And the actual benefit that then comes (even when the memory is enough)
> in is that (more) physical memory can be used for caching.
>
> Right?
>

Yes, more or less. Except that the term "caching" might not be the
best word to use in this context. For example, running a memory mapped
ELF executable, where the pages that the codepaths take are paged in
as needed probably isn't usually called "cache management".

> All this of course at the potential cost, that if one has some
> misbehaving application, the system may still go into trashing.
> Or is the kernel smart enough to prevent this?

Things like systemd-oomd are supposed to detect misbehaving services
and apps and shut them down cleanly before they can misbehave too
much.

Lennart

--
Lennart Poettering, Berlin


Re: [systemd-devel] how to let systemd hibernate start/stop the swap area?

2023-03-31 Thread Michael Chapman
On Sat, 1 Apr 2023, Uoti Urpala wrote:
> On Sat, 2023-04-01 at 06:16 +1100, Michael Chapman wrote:
> > On Fri, 31 Mar 2023, Lennart Poettering wrote:
> > [...]
> > > Presumably your system mmaps ELF binaries, VM images, and similar
> > > stuff into memory. if you don't allow anonymous memory to backed out
> > > onto swap, then you basically telling the kernel "please page out
> > > my program code out instead". Which is typically a lot worse.
> > 
> > Yes, but my point is that it _doesn't matter_ if SSH or journald or 
> > whatever is in memory or needs to be paged back in again. It's such a tiny 
> > fraction of the system's overall workload.
> 
> That contradicts what you said earlier about the system actually
> writing a significant amount of data to swap. If, when swap was
> enabled, the system wrote a large amount of data to the swap, that
> implies there must be a large amount of some other data that it was
> able to keep in memory instead.

Buffer cache. Often stuff that the guests never ended up needing again, or 
at least could survive the penalty of having it read back off disk again.

Of course the host kernel didn't know that, since it cannot predict the 
future. All it knows is that IO is happening, and there are idle pages in 
the guest. Of course it's going to steadily push those idle pages out to 
swap. And the graphs I had at the time showed a very nice linearly- 
increasing swap usage -- until the swap was full.

> Linux should not write all information
> from memory to swap just to leave the memory empty and without any
> useful content - everything written to swap should correspond to
> something else kept in memory.
> 
> So if you say that the swap use was overall harmful for behavior,
> claiming that the *size* of other data kept in memory was too small to
> matter doesn't really make any sense. If the swap use was significant,
> then it should have kept a significant amount of some other data in
> memory, either for the main OS or for the guests.

The "harmful behaviour" was the fact that _when_ those guests needed to be 
swapped in, that was unpleasantly slow.

The existence of swap had little to no effect on the running behaviour of 
the guests themselves -- as I keep saying, when you have enough buffer 
cache on the host, having "a bit more" because you've got swap as well 
does very little. You're already in the long tail of your performance 
graphs.

Can I make this any simpler? How about this:

* Whether swap was there or not had _no_ measurable effect on the guests' 
  performance.
* Having swap meant there was a large swap-in penalty in certain 
  circumstances. (Migration was one of them. "Rebooting a Windows VM" was
  another, since Windows apparently likes to zero all of its RAM.)

Does it make sense now?


Re: [systemd-devel] how to let systemd hibernate start/stop the swap area?

2023-03-31 Thread Uoti Urpala
On Sat, 2023-04-01 at 06:16 +1100, Michael Chapman wrote:
> On Fri, 31 Mar 2023, Lennart Poettering wrote:
> [...]
> > Presumably your system mmaps ELF binaries, VM images, and similar
> > stuff into memory. if you don't allow anonymous memory to backed out
> > onto swap, then you basically telling the kernel "please page out
> > my program code out instead". Which is typically a lot worse.
> 
> Yes, but my point is that it _doesn't matter_ if SSH or journald or 
> whatever is in memory or needs to be paged back in again. It's such a tiny 
> fraction of the system's overall workload.

That contradicts what you said earlier about the system actually
writing a significant amount of data to swap. If, when swap was
enabled, the system wrote a large amount of data to the swap, that
implies there must be a large amount of some other data that it was
able to keep in memory instead. Linux should not write all information
from memory to swap just to leave the memory empty and without any
useful content - everything written to swap should correspond to
something else kept in memory.

So if you say that the swap use was overall harmful for behavior,
claiming that the *size* of other data kept in memory was too small to
matter doesn't really make any sense. If the swap use was significant,
then it should have kept a significant amount of some other data in
memory, either for the main OS or for the guests.



Re: [systemd-devel] how to let systemd hibernate start/stop the swap area?

2023-03-31 Thread Michael Chapman
On Fri, 31 Mar 2023, Lennart Poettering wrote:
[...]
> Presumably your system mmaps ELF binaries, VM images, and similar
> stuff into memory. if you don't allow anonymous memory to backed out
> onto swap, then you basically telling the kernel "please page out
> my program code out instead". Which is typically a lot worse.

Yes, but my point is that it _doesn't matter_ if SSH or journald or 
whatever is in memory or needs to be paged back in again. It's such a tiny 
fraction of the system's overall workload.

This is why Luca's suggestion of using memory.swap.max=0 on all the QEMU 
processes isn't measurably better than just not using swap at all. Either 
99% of the system isn't using swap, or 100% of it isn't using swap.

> That's why I am saying that yeah, if you want zero IO then that's OK,
> but in that case you want *neither* anonymous memory being backed by
> disk swap *nor* file-backed memory backed by disk file systems. But
> you made the strange choice of saying "IO by file-backed memory is
> good", but "IO by anonymous memory" is bad, and then allow the former
> and forbid the latter.
> 
> hence my question: do you run your OS from an in-memory file system of
> some kind? because if not you just shift around what gets paged out,
> and because you make the pool of reclaimable memory smaller you
> increase IO.

In practice, everything that needed to run on the host was either already 
in memory or could be paged in quickly. Given the sum total of that was 
only a GB or so, that's not surprising.

[...]
> Well, in larger environments the goal is typically to saturate all
> hosts, but not overload them. i.e. maximizing your ROI. No need to
> fall from one extreme into the other. Today's Linux can actually
> achieve something like this, if you use it properly. Swap is part of
> using it "properly".
> 
> Oversized hw is typically a bad investment. In particular in today's
> cloud world where costs multiply with every node you have.

If customers have paid for RAM, you don't turn around and given them swap 
instead. That's just plain dishonest.

So yes, the system _does_ need to have more physical memory than the sum 
of the guests' virtual memory. Then you add on a bit more so you've got 
some room for buffers and page cache, since (at least in my case) IO was 
local. That's the size of the server you need.


Re: [systemd-devel] how to let systemd hibernate start/stop the swap area?

2023-03-31 Thread Richard Purdie
On Fri, 2023-03-31 at 12:50 +0200, Lennart Poettering wrote:
> On Do, 30.03.23 13:16, Phillip Susi (ph...@thesusis.net) wrote:
> 
> > 
> > Lennart Poettering  writes:
> > 
> > > oomd/PSI looks at memory allocation latencies to determine memory
> > > pressure. Since you disallow anonymous memory to be paged out and thus
> > > increase IO on file backed memory you increase the latencies
> > > unnecessarily, thus making oomd trigger earlier.
> > 
> > Did this get changed in the last few years?  Because I'm sure it used to
> > be based on the total commit limit, and so OOM wouldn't start killing
> > until your swap was full, which didn't happen until the system was
> > thrashing itself to uselessness for 20 minutes already.
> 
> oomd becomes active on two distinct triggers:
> 
> This one:
> 
> https://github.com/systemd/systemd/blob/main/src/oom/oomd-manager.c#L383
> 
> and this one:
> 
> https://github.com/systemd/systemd/blob/main/src/oom/oomd-manager.c#L486
> 
> The latter is PSI.

We ended up having to disable systemd-oomd on our autobuilder/CI
systems since it got upset when a single process tree was using the
majority of the system resources (from memory greater than 90%?). 

On a CI system, we'd expect the majority of the system resources to be
used by that single user/process tree so this was a bit annoying and we
ended up disabling it. Everything was fine since so it was a false
positive.

I'm having trouble mapping that behaviour to the above two triggers...

Cheers,

Richard




Re: [systemd-devel] how to let systemd hibernate start/stop the swap area?

2023-03-31 Thread Lennart Poettering
On Fr, 31.03.23 21:54, Michael Chapman (m...@very.puzzling.org) wrote:

> > because otherwise you just remove the latencies from anonymous memory
> > but you amplify the latencies on file-backed memory. Which is overall
> > worse, not better.
>
> The host isn't doing much IO. Just a bit of logging really.

IO is not just writing stuff. If you run some OS a lot more IO is
generated by the fact that ELF binaries are mapped into memory and
then paged in as they run than by generating a bit of log entries.

By saying "hey, never page out anonymous memory!" to the kernel (by
not having swap), you basically say "but please page out file-backed
memory even more, please please, go ahead, now".

> How would the
> existence of swap effect that? Is it really so much better to be able to
> log messages just that little bit faster, but you've now got to wait for
> `sshd` to swap back in whenever you SSH to the system?

Presumably your system mmaps ELF binaries, VM images, and similar
stuff into memory. if you don't allow anonymous memory to backed out
onto swap, then you basically telling the kernel "please page out
my program code out instead". Which is typically a lot worse.

That's why I am saying that yeah, if you want zero IO then that's OK,
but in that case you want *neither* anonymous memory being backed by
disk swap *nor* file-backed memory backed by disk file systems. But
you made the strange choice of saying "IO by file-backed memory is
good", but "IO by anonymous memory" is bad, and then allow the former
and forbid the latter.

hence my question: do you run your OS from an in-memory file system of
some kind? because if not you just shift around what gets paged out,
and because you make the pool of reclaimable memory smaller you
increase IO.

> > > I know this works because I have literally done it on many, many
> > > hypervisors for over a decade.
> >
> > I mean, you have a point: if you run on idle machines where hardware
> > is so massively oversized for the job you are doing, you can operate
> > really nicely without swap. No doubt. But that's kinda
> > wasteful. Resource-management through oversized hw is certainly a way to
> > solve problems, no doubt.
>
> The alternative would be to _overprovision_ the server -- i.e. put more
> VMs on it than it can support. That would just be stupid.

Well, in larger environments the goal is typically to saturate all
hosts, but not overload them. i.e. maximizing your ROI. No need to
fall from one extreme into the other. Today's Linux can actually
achieve something like this, if you use it properly. Swap is part of
using it "properly".

Oversized hw is typically a bad investment. In particular in today's
cloud world where costs multiply with every node you have.

Lennart

--
Lennart Poettering, Berlin


Re: [systemd-devel] how to let systemd hibernate start/stop the swap area?

2023-03-31 Thread Christoph Anton Mitterer
Hey.

Just for better understanding:

AFAIU, the main idea of having swap despite enough memory was the
following:

Unless when processes explicitly release memory (or get stopped), the
kernel can mostly reclaim only cached memory,... but if swap is
available it can also reclaim anonymous memory.

So the idea is, that processes might have pages that are literally
never used (except for initial loading), yet still kept in memory... so
these permanently eat up physical memory when they cannot be swapped
out.

And the actual benefit that then comes (even when the memory is enough)
in is that (more) physical memory can be used for caching.

Right?
Or were there any other general ways by how this improves performance?


All this of course at the potential cost, that if one has some
misbehaving application, the system may still go into trashing.
Or is the kernel smart enough to prevent this?


thanks,
Chris.


Re: [systemd-devel] how to let systemd hibernate start/stop the swap area?

2023-03-31 Thread Michael Chapman
On Fri, 31 Mar 2023, Lennart Poettering wrote:
> On Fr, 31.03.23 07:57, Michael Chapman (m...@very.puzzling.org) wrote:
> 
> > On Fri, 31 Mar 2023, Luca Boccassi wrote:
> > [...]
> > > No, it does not make "little difference", there are entire subsystems
> > > which are much worse off, if not completely useless, without swap.
> > > Post-cgroupsv2 memory controller things are considerably different on
> > > this front, and old "common wisdom" no longer applies.
> >
> > What are some examples here?
> >
> > What specifically is the difference between:
> >
> > * swap does not exist at all;
> > * swap is full of data that will not be swapped in for weeks or
> >   months;
> 
> The big difference is that the RAM that became available because the
> unused stuff was swapped out has been applied to better uses,
> i.e. keep more frequently used stuff around, improving performance of
> the often used stuff, at the price of degrading peformance of the
> apparently never used stuff. Overall win!

Honestly, I feel that I've covered this already... but I'll try again.

It is only a win if that actually results in better performance! If all 
you've done is swapped out a whole lot of data, but the rest of the system 
still has the same performance, you're _worse_ off.


Re: [systemd-devel] how to let systemd hibernate start/stop the swap area?

2023-03-31 Thread Michael Chapman
On Fri, 31 Mar 2023, Lennart Poettering wrote:
> On Fr, 31.03.23 18:24, Michael Chapman (m...@very.puzzling.org) wrote:
> 
> > On Fri, 31 Mar 2023, Barry wrote:
> > [...]
> > > If you want to run in ram only then you must turn off the kernel 
> > > overcommit.
> > > Have you done that? If not then you risk processes getting SEGV signals.
> >
> > Seriously. It's almost as if nobody here is actually reading anything of
> > what I've written!
> >
> > EVERYTHING fits in RAM. The non-guest processes total perhaps a GB in
> > total. The guest processes total maybe 200 GB in total. The server has
> > more RAM than all of that.
> 
> I presume you are also running the OS itself from RAM then? i.e. your
> rootfs is not backed by disk, but by some in-memory fs, or a loopback
> on a memfd or so?
> 
> because otherwise you just remove the latencies from anonymous memory
> but you amplify the latencies on file-backed memory. Which is overall
> worse, not better.

The host isn't doing much IO. Just a bit of logging really. How would the 
existence of swap effect that? Is it really so much better to be able to 
log messages just that little bit faster, but you've now got to wait for 
`sshd` to swap back in whenever you SSH to the system?

> > I know this works because I have literally done it on many, many
> > hypervisors for over a decade.
> 
> I mean, you have a point: if you run on idle machines where hardware
> is so massively oversized for the job you are doing, you can operate
> really nicely without swap. No doubt. But that's kinda
> wasteful. Resource-management through oversized hw is certainly a way to
> solve problems, no doubt.

The alternative would be to _overprovision_ the server -- i.e. put more 
VMs on it than it can support. That would just be stupid.

Sticking with the example above, if the guests' RAM total 200 GB are you 
really suggesting that only having, say, a 204 GB server and making up for 
the lack of memory by adding swap would actually be better? Of course it 
wouldn't!

The VMs certainly weren't idle. Some of them had a fair bit of idle RAM, 
yes, but that's not entirely unusual. Plus, I don't get to choose what the 
VMs run, my customers do.


Re: [systemd-devel] how to let systemd hibernate start/stop the swap area?

2023-03-31 Thread Lennart Poettering
On Do, 30.03.23 13:16, Phillip Susi (ph...@thesusis.net) wrote:

>
> Lennart Poettering  writes:
>
> > oomd/PSI looks at memory allocation latencies to determine memory
> > pressure. Since you disallow anonymous memory to be paged out and thus
> > increase IO on file backed memory you increase the latencies
> > unnecessarily, thus making oomd trigger earlier.
>
> Did this get changed in the last few years?  Because I'm sure it used to
> be based on the total commit limit, and so OOM wouldn't start killing
> until your swap was full, which didn't happen until the system was
> thrashing itself to uselessness for 20 minutes already.

oomd becomes active on two distinct triggers:

This one:

https://github.com/systemd/systemd/blob/main/src/oom/oomd-manager.c#L383

and this one:

https://github.com/systemd/systemd/blob/main/src/oom/oomd-manager.c#L486

The latter is PSI.

> What happens if you use zswap?  Will hibernation try to save things to
> there instead of a real disk swap?  It might be nice to have zswap for
> normal use and the on disk swap for hibernate.

our sleep code does not consider zram devices for hibernation.

Lennart

--
Lennart Poettering, Berlin


Re: [systemd-devel] how to let systemd hibernate start/stop the swap area?

2023-03-31 Thread Lennart Poettering
On Fr, 31.03.23 07:57, Michael Chapman (m...@very.puzzling.org) wrote:

> On Fri, 31 Mar 2023, Luca Boccassi wrote:
> [...]
> > No, it does not make "little difference", there are entire subsystems
> > which are much worse off, if not completely useless, without swap.
> > Post-cgroupsv2 memory controller things are considerably different on
> > this front, and old "common wisdom" no longer applies.
>
> What are some examples here?
>
> What specifically is the difference between:
>
> * swap does not exist at all;
> * swap is full of data that will not be swapped in for weeks or
>   months;

The big difference is that the RAM that became available because the
unused stuff was swapped out has been applied to better uses,
i.e. keep more frequently used stuff around, improving performance of
the often used stuff, at the price of degrading peformance of the
apparently never used stuff. Overall win!

Lennart

--
Lennart Poettering, Berlin


Re: [systemd-devel] how to let systemd hibernate start/stop the swap area?

2023-03-31 Thread Lennart Poettering
On Fr, 31.03.23 18:24, Michael Chapman (m...@very.puzzling.org) wrote:

> On Fri, 31 Mar 2023, Barry wrote:
> [...]
> > If you want to run in ram only then you must turn off the kernel overcommit.
> > Have you done that? If not then you risk processes getting SEGV signals.
>
> Seriously. It's almost as if nobody here is actually reading anything of
> what I've written!
>
> EVERYTHING fits in RAM. The non-guest processes total perhaps a GB in
> total. The guest processes total maybe 200 GB in total. The server has
> more RAM than all of that.

I presume you are also running the OS itself from RAM then? i.e. your
rootfs is not backed by disk, but by some in-memory fs, or a loopback
on a memfd or so?

because otherwise you just remove the latencies from anonymous memory
but you amplify the latencies on file-backed memory. Which is overall
worse, not better.

> I know this works because I have literally done it on many, many
> hypervisors for over a decade.

I mean, you have a point: if you run on idle machines where hardware
is so massively oversized for the job you are doing, you can operate
really nicely without swap. No doubt. But that's kinda
wasteful. Resource-management through oversized hw is certainly a way to
solve problems, no doubt.

Lennart

--
Lennart Poettering, Berlin


Re: [systemd-devel] how to let systemd hibernate start/stop the swap area?

2023-03-31 Thread Michael Chapman
On Fri, 31 Mar 2023, Lennart Poettering wrote:
> On Do, 30.03.23 18:56, Michael Chapman (m...@very.puzzling.org) wrote:
> 
> > On Thu, 30 Mar 2023, Lennart Poettering wrote:
> > > On Mi, 29.03.23 13:53, Christoph Anton Mitterer (cales...@scientia.org) 
> > > wrote:
> > >
> > > > > > That's a bad idea btw. I'd advise you not to do that: on modern
> > > > > > systems you want swap, since it makes anonymous memory reclaimable.
> > > > > > I
> > > > > > am not sure where you are getting this idea from that swap was
> > > > > > bad.
> > > >
> > > > Well I haven't said it's bad, but I guess it depends on the use case
> > > > any available RAM.
> > >
> > > In almost all scenarios you want swap, regardless if little RAM or a
> > > lot. For specialist cases where you run everything from memory, and
> > > not even programs are backed by disk there might be exceptions. But
> > > that#s almost never the case.
> >
> > One specific case where I deliberately chose _not_ to use swap: large
> > hypervisors with local storage.
> >
> > With swap on the host enabled, all that ended up happening was that local
> > IO activity caused idle guest memory to be gradually swapped out.
> > Eventually all of the swap space filled up, and the system was exactly
> > where it would have been had it not had any swap space configured in the
> > first place -- except that it was now _a lot_ slower to migrate those
> > swapped-out guests to other hypervisors.
> 
> Linux will swap out stuff only if it has better uses for the RAM. So
> yeah, apparently your VMs where mostly idle, and the RAM was better
> used for other stuff, and ultimately helped speed up things for that
> other more frequently used stuff. Which is an overall win, not a loss.
> 
> If the key requirement you have to make VMs migrate quickly, then
> yeah, then allowing them to be written to disk is of course a
> problem. But frankly, if the ability to migrate VMs quickly is your
> top priority and general performance irrelevant, then you might have
> weird priorities? Also, are you sure your network is faster than your
> local disk?

Certainly faster than the swap-in path! 10 GigE networking really helps.

Migrating VMs quickly was not a "key requirement" at all. But it was 
important, and not having swap meant that it could be achieved _without_ 
causing any other problems.

Think about it: instead of 50 GB of RAM usable as buffer and page cache, 
let's say I had added swap and allowed that to increase to 100 GB. Would 
that really make much of a difference to IO performance in guests? 
Probably not. Sure, I could _engineer_ a test where it made a difference, 
but in _real-world_ usage it doesn't change things too much.
 
> generally though: i am not doubting that sometimes latency matters for
> certain jobs, and paging stuff back in is slow and thus makes
> latencies worse. But the way to address that is not turn of swap for
> everything, but just for the jobs where the latency means, via the
> appropriate cgroup settings.

I get that. But why would I spend time doing that rather than just hitting 
the big `swapoff` button, when that effectively yields the same result? 
The only difference would be about a GB: the size of all the processes 
that weren't in guests.

> The thing is, anonymous memory is just
> one kind of memory, and if you turn off swap then your force that to
> remain in RAM – but at the same time you still allow file-based stuff
> to be reclaimed so that it must be reread later from disk. If you use
> the right resource management settings you have much better control on
> that, too, and can comprehensively solve the issue, and get the
> latencies you want.
> 
> Or to turn this around: if you are concerned about the latencies swap
> is supposed to "introduce", but you do not run your whole OS from an
> in-memory image too, then you are doing things wrong and not actually
> solving what you want to solve.
> 
> Lennart
> 
> --
> Lennart Poettering, Berlin
> 
> 

Re: [systemd-devel] how to let systemd hibernate start/stop the swap area?

2023-03-31 Thread Lennart Poettering
On Do, 30.03.23 18:56, Michael Chapman (m...@very.puzzling.org) wrote:

> On Thu, 30 Mar 2023, Lennart Poettering wrote:
> > On Mi, 29.03.23 13:53, Christoph Anton Mitterer (cales...@scientia.org) 
> > wrote:
> >
> > > > > That's a bad idea btw. I'd advise you not to do that: on modern
> > > > > systems you want swap, since it makes anonymous memory reclaimable.
> > > > > I
> > > > > am not sure where you are getting this idea from that swap was
> > > > > bad.
> > >
> > > Well I haven't said it's bad, but I guess it depends on the use case
> > > any available RAM.
> >
> > In almost all scenarios you want swap, regardless if little RAM or a
> > lot. For specialist cases where you run everything from memory, and
> > not even programs are backed by disk there might be exceptions. But
> > that#s almost never the case.
>
> One specific case where I deliberately chose _not_ to use swap: large
> hypervisors with local storage.
>
> With swap on the host enabled, all that ended up happening was that local
> IO activity caused idle guest memory to be gradually swapped out.
> Eventually all of the swap space filled up, and the system was exactly
> where it would have been had it not had any swap space configured in the
> first place -- except that it was now _a lot_ slower to migrate those
> swapped-out guests to other hypervisors.

Linux will swap out stuff only if it has better uses for the RAM. So
yeah, apparently your VMs where mostly idle, and the RAM was better
used for other stuff, and ultimately helped speed up things for that
other more frequently used stuff. Which is an overall win, not a loss.

If the key requirement you have to make VMs migrate quickly, then
yeah, then allowing them to be written to disk is of course a
problem. But frankly, if the ability to migrate VMs quickly is your
top priority and general performance irrelevant, then you might have
weird priorities? Also, are you sure your network is faster than your
local disk?

generally though: i am not doubting that sometimes latency matters for
certain jobs, and paging stuff back in is slow and thus makes
latencies worse. But the way to address that is not turn of swap for
everything, but just for the jobs where the latency means, via the
appropriate cgroup settings. The thing is, anonymous memory is just
one kind of memory, and if you turn off swap then your force that to
remain in RAM – but at the same time you still allow file-based stuff
to be reclaimed so that it must be reread later from disk. If you use
the right resource management settings you have much better control on
that, too, and can comprehensively solve the issue, and get the
latencies you want.

Or to turn this around: if you are concerned about the latencies swap
is supposed to "introduce", but you do not run your whole OS from an
in-memory image too, then you are doing things wrong and not actually
solving what you want to solve.

Lennart

--
Lennart Poettering, Berlin


Re: [systemd-devel] how to let systemd hibernate start/stop the swap area?

2023-03-31 Thread Lennart Poettering
On Do, 30.03.23 01:39, Christoph Anton Mitterer (cales...@scientia.org) wrote:

> Well that's clear, it's just that on my systems (both servers and
> workstations) I've never really run into the need to reclaim lots of
> anonymous memory.

It's the Linux kernel that reclaims memory for you. You don't need to
reclaim that yourself, personally, you know.

The thing is simply: RAM is best used for stuff that is actually
accessed. If you prohibit that you just make the overall behaviour of
the system worse. The kernel is pretty good at figuring out what is
needed in RAM and what is not. Except of course if you don't let it,
and force it to keep useless stuff in memory.

> Since you're anyway rather against the whole idea,... I assume you
> wouldn't want a PR that adds something like that as an example to the
> manpages?

No, these exotic env vars are on purpose not documented in the man
page, but separately in that markdown doc, since they come with a
lower stability guarantee and are a bit icky. People shouldn#t really
use them in the general case, they exist for local hacks only, and
local hacks have no place in the main documentation that the man pages are.

> Might make it easier for people to use it properly :-)

It's a bad idea to do what you are doing. I don't think we need to
make

Lennart

--
Lennart Poettering, Berlin


Re: [systemd-devel] how to let systemd hibernate start/stop the swap area?

2023-03-31 Thread Michael Chapman
On Fri, 31 Mar 2023, Tomasz Torcz wrote:
> On Fri, Mar 31, 2023 at 06:24:09PM +1100, Michael Chapman wrote:
> > On Fri, 31 Mar 2023, Barry wrote:
> > [...]
> > > If you want to run in ram only then you must turn off the kernel 
> > > overcommit.
> > > Have you done that? If not then you risk processes getting SEGV signals.
> > 
> > Seriously. It's almost as if nobody here is actually reading anything of 
> > what I've written!
> > 
> > EVERYTHING fits in RAM. The non-guest processes total perhaps a GB in 
> > total. The guest processes total maybe 200 GB in total. The server has 
> > more RAM than all of that.
> 
>   Your situation seems to be special. People in this thread seem to be
> focused on generic Linux computer use-case.

And that was entirely my point. Lennart had mentioned one special case. I 
was just suggesting another one.

I wasn't expecting everyone to pile on and say "no, your special case is 
WRONG, and you're a bad person for even thinking about it!" :-p


Re: [systemd-devel] how to let systemd hibernate start/stop the swap area?

2023-03-31 Thread Tomasz Torcz
On Fri, Mar 31, 2023 at 06:24:09PM +1100, Michael Chapman wrote:
> On Fri, 31 Mar 2023, Barry wrote:
> [...]
> > If you want to run in ram only then you must turn off the kernel overcommit.
> > Have you done that? If not then you risk processes getting SEGV signals.
> 
> Seriously. It's almost as if nobody here is actually reading anything of 
> what I've written!
> 
> EVERYTHING fits in RAM. The non-guest processes total perhaps a GB in 
> total. The guest processes total maybe 200 GB in total. The server has 
> more RAM than all of that.

  Your situation seems to be special. People in this thread seem to be
focused on generic Linux computer use-case.


> I know this works because I have literally done it on many, many 
> hypervisors for over a decade.

  On the other hand, kernel 4.0, which greatly changed how swap works*,
was released half a decade ago. Maybe it's time to revisit assumptions?

* according to Chris Down's blog note linked by Lennart at the beginning
  of the thread.

-- 
Tomasz Torcz   72->|   80->|
to...@pipebreaker.pl   72->|   80->|



Re: [systemd-devel] how to let systemd hibernate start/stop the swap area?

2023-03-31 Thread Michael Chapman
On Fri, 31 Mar 2023, Barry wrote:
[...]
> If you want to run in ram only then you must turn off the kernel overcommit.
> Have you done that? If not then you risk processes getting SEGV signals.

Seriously. It's almost as if nobody here is actually reading anything of 
what I've written!

EVERYTHING fits in RAM. The non-guest processes total perhaps a GB in 
total. The guest processes total maybe 200 GB in total. The server has 
more RAM than all of that.

I know this works because I have literally done it on many, many 
hypervisors for over a decade.


Re: [systemd-devel] how to let systemd hibernate start/stop the swap area?

2023-03-31 Thread Barry



> On 31 Mar 2023, at 00:51, Michael Chapman  wrote:
> 
> On Fri, 31 Mar 2023, Phillip Susi wrote:
>> 
>> Michael Chapman  writes:
>> 
>>> What specifically is the difference between:
>>> 
>>> * swap does not exist at all;
>>> * swap is full of data that will not be swapped in for weeks or months;
>> 
>> That's the wrong question.
> 
> Nevertheless it was the question I was faced with. I had servers with a 
> huge amount of memory, a fair bit of swap, and ALL of that swap filled 
> with stuff that would need to be entirely swapped back in at some point at 
> a moments notice.
> 
> The solution was simple: turn off swap. Now there was no "swap everything 
> back in" penalty, and since there was plenty of RAM anyway the change had 
> little impact on the behaviour of the rest of the system.

If you want to run in ram only then you must turn off the kernel overcommit.
Have you done that? If not then you risk processes getting SEGV signals.

There is a lot of moving parts that affect the robustness of a big server.
Swap is one of them that is important to allow efficient use of all the
hardware resources.

I work on servers with 400G of ram, but it is all used. Swap is a critical part 
of tuning
the performance with the network heavy work load that disk I/O impacts.

Barry

> 



Re: [systemd-devel] how to let systemd hibernate start/stop the swap area?

2023-03-30 Thread Michael Chapman
On Fri, 31 Mar 2023, Phillip Susi wrote:
> 
> Michael Chapman  writes:
> 
> > What specifically is the difference between:
> >
> > * swap does not exist at all;
> > * swap is full of data that will not be swapped in for weeks or months;
> 
> That's the wrong question.

Nevertheless it was the question I was faced with. I had servers with a 
huge amount of memory, a fair bit of swap, and ALL of that swap filled 
with stuff that would need to be entirely swapped back in at some point at 
a moments notice.

The solution was simple: turn off swap. Now there was no "swap everything 
back in" penalty, and since there was plenty of RAM anyway the change had 
little impact on the behaviour of the rest of the system.


Re: [systemd-devel] how to let systemd hibernate start/stop the swap area?

2023-03-30 Thread Phillip Susi


Michael Chapman  writes:

> What specifically is the difference between:
>
> * swap does not exist at all;
> * swap is full of data that will not be swapped in for weeks or months;

That's the wrong question.  The question is, what is the difference
between having NO swap, and having some swap that you don't use much of?
The answer to that is that there will be a non zero amount of anonymous
memory allocated to processes that hardly ever touch it, and that can be
tossed out to swap to provide more memory to use for, if nothing else,
caching files that ARE being accessed.  Now that amount may not be much
if you usually have plenty of free ram, but it won't be zero.

I too have long gone without a swap partition because the small benefit
of having a little more ram to cache files did not justify the risk of
going into thrashing mode when some process went haywire, but if that
problem has been solved, and you want a swap partition for hibernation
anyhow, then you may as well keep it mounted all the time since
unmounting it when you aren't about to hibernate costs *something* and
gains *nothing*.



Re: [systemd-devel] how to let systemd hibernate start/stop the swap area?

2023-03-30 Thread Michael Chapman
On Fri, 31 Mar 2023, Luca Boccassi wrote:
[...]
> No, it does not make "little difference", there are entire subsystems
> which are much worse off, if not completely useless, without swap.
> Post-cgroupsv2 memory controller things are considerably different on
> this front, and old "common wisdom" no longer applies.

What are some examples here?

What specifically is the difference between:

* swap does not exist at all;
* swap is full of data that will not be swapped in for weeks or months;

?

Either way, nothing more can be swapped out, and nothing will get swapped 
in.

If everything fits in RAM, as far as I can see the only thing allowing 
"non-guest processes" to be swapped out is that I'd get slightly more 
available RAM for buffers and cache. But as I noted in the other thread, 
I've already got enough of that!

So what advantage would there be to me to enable swap? I started off this 
thread with a disadvantage, so there would have to be a *big* advantage to 
counter that.

I am well aware that "you should always have swap" is good general advice 
for most users. But it's important to remember there are exceptions to it! 
Lennart suggested one such situation is where you're running everything 
off RAM anyway. I suggest another such situation is where you have 
sufficient RAM that your entire workload comfortably fits within it.


Re: [systemd-devel] how to let systemd hibernate start/stop the swap area?

2023-03-30 Thread Phillip Susi


Lennart Poettering  writes:

> oomd/PSI looks at memory allocation latencies to determine memory
> pressure. Since you disallow anonymous memory to be paged out and thus
> increase IO on file backed memory you increase the latencies
> unnecessarily, thus making oomd trigger earlier.

Did this get changed in the last few years?  Because I'm sure it used to
be based on the total commit limit, and so OOM wouldn't start killing
until your swap was full, which didn't happen until the system was
thrashing itself to uselessness for 20 minutes already.

If this has been fixed then I guess it's time for me to start using swap
again.

What happens if you use zswap?  Will hibernation try to save things to
there instead of a real disk swap?  It might be nice to have zswap for
normal use and the on disk swap for hibernate.



Re: [systemd-devel] how to let systemd hibernate start/stop the swap area?

2023-03-30 Thread Luca Boccassi
On Thu, 30 Mar 2023 at 11:09, Michael Chapman  wrote:
>
> On Thu, 30 Mar 2023, Luca Boccassi wrote:
> > On Thu, 30 Mar 2023 at 10:15, Michael Chapman  
> > wrote:
> > >
> > > On Thu, 30 Mar 2023, Lennart Poettering wrote:
> > > > On Mi, 29.03.23 13:53, Christoph Anton Mitterer (cales...@scientia.org) 
> > > > wrote:
> > > >
> > > > > > > That's a bad idea btw. I'd advise you not to do that: on modern
> > > > > > > systems you want swap, since it makes anonymous memory 
> > > > > > > reclaimable.
> > > > > > > I
> > > > > > > am not sure where you are getting this idea from that swap was
> > > > > > > bad.
> > > > >
> > > > > Well I haven't said it's bad, but I guess it depends on the use case
> > > > > any available RAM.
> > > >
> > > > In almost all scenarios you want swap, regardless if little RAM or a
> > > > lot. For specialist cases where you run everything from memory, and
> > > > not even programs are backed by disk there might be exceptions. But
> > > > that#s almost never the case.
> > >
> > > One specific case where I deliberately chose _not_ to use swap: large
> > > hypervisors with local storage.
> > >
> > > With swap on the host enabled, all that ended up happening was that local
> > > IO activity caused idle guest memory to be gradually swapped out.
> > > Eventually all of the swap space filled up, and the system was exactly
> > > where it would have been had it not had any swap space configured in the
> > > first place -- except that it was now _a lot_ slower to migrate those
> > > swapped-out guests to other hypervisors.
> > >
> > > - Michael
> >
> > The solution there is to ensure the cgroup configuration for the
> > slices where the guests run have memory.swap.max=0, rather than
> > disabling it for the whole system.
>
> Perhaps, but given the rest of processes on the system need just a few
> hundred MB max, and the server has hundreds of GB of RAM, it really makes
> little difference. Turning off swap altogether is plain _simpler_.

No, it does not make "little difference", there are entire subsystems
which are much worse off, if not completely useless, without swap.
Post-cgroupsv2 memory controller things are considerably different on
this front, and old "common wisdom" no longer applies.


Re: [systemd-devel] how to let systemd hibernate start/stop the swap area?

2023-03-30 Thread Michael Chapman
On Thu, 30 Mar 2023, Greg KH wrote:
> On Thu, Mar 30, 2023 at 09:09:19PM +1100, Michael Chapman wrote:
> > On Thu, 30 Mar 2023, Luca Boccassi wrote:
> > > On Thu, 30 Mar 2023 at 10:15, Michael Chapman  
> > > wrote:
> > > >
> > > > On Thu, 30 Mar 2023, Lennart Poettering wrote:
> > > > > On Mi, 29.03.23 13:53, Christoph Anton Mitterer 
> > > > > (cales...@scientia.org) wrote:
> > > > >
> > > > > > > > That's a bad idea btw. I'd advise you not to do that: on modern
> > > > > > > > systems you want swap, since it makes anonymous memory 
> > > > > > > > reclaimable.
> > > > > > > > I
> > > > > > > > am not sure where you are getting this idea from that swap was
> > > > > > > > bad.
> > > > > >
> > > > > > Well I haven't said it's bad, but I guess it depends on the use case
> > > > > > any available RAM.
> > > > >
> > > > > In almost all scenarios you want swap, regardless if little RAM or a
> > > > > lot. For specialist cases where you run everything from memory, and
> > > > > not even programs are backed by disk there might be exceptions. But
> > > > > that#s almost never the case.
> > > >
> > > > One specific case where I deliberately chose _not_ to use swap: large
> > > > hypervisors with local storage.
> > > >
> > > > With swap on the host enabled, all that ended up happening was that 
> > > > local
> > > > IO activity caused idle guest memory to be gradually swapped out.
> > > > Eventually all of the swap space filled up, and the system was exactly
> > > > where it would have been had it not had any swap space configured in the
> > > > first place -- except that it was now _a lot_ slower to migrate those
> > > > swapped-out guests to other hypervisors.
> > > >
> > > > - Michael
> > > 
> > > The solution there is to ensure the cgroup configuration for the
> > > slices where the guests run have memory.swap.max=0, rather than
> > > disabling it for the whole system.
> > 
> > Perhaps, but given the rest of processes on the system need just a few 
> > hundred MB max, and the server has hundreds of GB of RAM, it really makes 
> > little difference. Turning off swap altogether is plain _simpler_.
> 
> So you penalize the runtime performance of guests for the infrequent
> migration delay?  Sounds like a bad trade-off for any real workload
> those guests are doing.  Shouldn't the goal of the system be to solve
> the problem the guests are trying to solve instead of being optimized
> for the infrequent administration tasks?

Err... how is ensuring the guests actually stay in memory "penalising 
their runtime performance"? If anything, it's exactly the opposite!

Note that the hypervisor wouldn't have been overprovisioned. If it were, 
say, a 256 GB server, I might only put 200 GB of guests on it. That still 
leaves around 50 GB for page cache, which is more than enough!

Adding a few GB of swap to the system would hardly make a difference.


Re: [systemd-devel] how to let systemd hibernate start/stop the swap area?

2023-03-30 Thread Greg KH
On Thu, Mar 30, 2023 at 09:09:19PM +1100, Michael Chapman wrote:
> On Thu, 30 Mar 2023, Luca Boccassi wrote:
> > On Thu, 30 Mar 2023 at 10:15, Michael Chapman  
> > wrote:
> > >
> > > On Thu, 30 Mar 2023, Lennart Poettering wrote:
> > > > On Mi, 29.03.23 13:53, Christoph Anton Mitterer (cales...@scientia.org) 
> > > > wrote:
> > > >
> > > > > > > That's a bad idea btw. I'd advise you not to do that: on modern
> > > > > > > systems you want swap, since it makes anonymous memory 
> > > > > > > reclaimable.
> > > > > > > I
> > > > > > > am not sure where you are getting this idea from that swap was
> > > > > > > bad.
> > > > >
> > > > > Well I haven't said it's bad, but I guess it depends on the use case
> > > > > any available RAM.
> > > >
> > > > In almost all scenarios you want swap, regardless if little RAM or a
> > > > lot. For specialist cases where you run everything from memory, and
> > > > not even programs are backed by disk there might be exceptions. But
> > > > that#s almost never the case.
> > >
> > > One specific case where I deliberately chose _not_ to use swap: large
> > > hypervisors with local storage.
> > >
> > > With swap on the host enabled, all that ended up happening was that local
> > > IO activity caused idle guest memory to be gradually swapped out.
> > > Eventually all of the swap space filled up, and the system was exactly
> > > where it would have been had it not had any swap space configured in the
> > > first place -- except that it was now _a lot_ slower to migrate those
> > > swapped-out guests to other hypervisors.
> > >
> > > - Michael
> > 
> > The solution there is to ensure the cgroup configuration for the
> > slices where the guests run have memory.swap.max=0, rather than
> > disabling it for the whole system.
> 
> Perhaps, but given the rest of processes on the system need just a few 
> hundred MB max, and the server has hundreds of GB of RAM, it really makes 
> little difference. Turning off swap altogether is plain _simpler_.

So you penalize the runtime performance of guests for the infrequent
migration delay?  Sounds like a bad trade-off for any real workload
those guests are doing.  Shouldn't the goal of the system be to solve
the problem the guests are trying to solve instead of being optimized
for the infrequent administration tasks?

good luck!

greg k-h


Re: [systemd-devel] how to let systemd hibernate start/stop the swap area?

2023-03-30 Thread Michael Chapman
On Thu, 30 Mar 2023, Luca Boccassi wrote:
> On Thu, 30 Mar 2023 at 10:15, Michael Chapman  wrote:
> >
> > On Thu, 30 Mar 2023, Lennart Poettering wrote:
> > > On Mi, 29.03.23 13:53, Christoph Anton Mitterer (cales...@scientia.org) 
> > > wrote:
> > >
> > > > > > That's a bad idea btw. I'd advise you not to do that: on modern
> > > > > > systems you want swap, since it makes anonymous memory reclaimable.
> > > > > > I
> > > > > > am not sure where you are getting this idea from that swap was
> > > > > > bad.
> > > >
> > > > Well I haven't said it's bad, but I guess it depends on the use case
> > > > any available RAM.
> > >
> > > In almost all scenarios you want swap, regardless if little RAM or a
> > > lot. For specialist cases where you run everything from memory, and
> > > not even programs are backed by disk there might be exceptions. But
> > > that#s almost never the case.
> >
> > One specific case where I deliberately chose _not_ to use swap: large
> > hypervisors with local storage.
> >
> > With swap on the host enabled, all that ended up happening was that local
> > IO activity caused idle guest memory to be gradually swapped out.
> > Eventually all of the swap space filled up, and the system was exactly
> > where it would have been had it not had any swap space configured in the
> > first place -- except that it was now _a lot_ slower to migrate those
> > swapped-out guests to other hypervisors.
> >
> > - Michael
> 
> The solution there is to ensure the cgroup configuration for the
> slices where the guests run have memory.swap.max=0, rather than
> disabling it for the whole system.

Perhaps, but given the rest of processes on the system need just a few 
hundred MB max, and the server has hundreds of GB of RAM, it really makes 
little difference. Turning off swap altogether is plain _simpler_.


Re: [systemd-devel] how to let systemd hibernate start/stop the swap area?

2023-03-30 Thread Luca Boccassi
On Thu, 30 Mar 2023 at 10:15, Michael Chapman  wrote:
>
> On Thu, 30 Mar 2023, Lennart Poettering wrote:
> > On Mi, 29.03.23 13:53, Christoph Anton Mitterer (cales...@scientia.org) 
> > wrote:
> >
> > > > > That's a bad idea btw. I'd advise you not to do that: on modern
> > > > > systems you want swap, since it makes anonymous memory reclaimable.
> > > > > I
> > > > > am not sure where you are getting this idea from that swap was
> > > > > bad.
> > >
> > > Well I haven't said it's bad, but I guess it depends on the use case
> > > any available RAM.
> >
> > In almost all scenarios you want swap, regardless if little RAM or a
> > lot. For specialist cases where you run everything from memory, and
> > not even programs are backed by disk there might be exceptions. But
> > that#s almost never the case.
>
> One specific case where I deliberately chose _not_ to use swap: large
> hypervisors with local storage.
>
> With swap on the host enabled, all that ended up happening was that local
> IO activity caused idle guest memory to be gradually swapped out.
> Eventually all of the swap space filled up, and the system was exactly
> where it would have been had it not had any swap space configured in the
> first place -- except that it was now _a lot_ slower to migrate those
> swapped-out guests to other hypervisors.
>
> - Michael

The solution there is to ensure the cgroup configuration for the
slices where the guests run have memory.swap.max=0, rather than
disabling it for the whole system.


Re: [systemd-devel] how to let systemd hibernate start/stop the swap area?

2023-03-30 Thread Michael Chapman
On Thu, 30 Mar 2023, Lennart Poettering wrote:
> On Mi, 29.03.23 13:53, Christoph Anton Mitterer (cales...@scientia.org) wrote:
> 
> > > > That's a bad idea btw. I'd advise you not to do that: on modern
> > > > systems you want swap, since it makes anonymous memory reclaimable.
> > > > I
> > > > am not sure where you are getting this idea from that swap was
> > > > bad.
> >
> > Well I haven't said it's bad, but I guess it depends on the use case
> > any available RAM.
> 
> In almost all scenarios you want swap, regardless if little RAM or a
> lot. For specialist cases where you run everything from memory, and
> not even programs are backed by disk there might be exceptions. But
> that#s almost never the case.

One specific case where I deliberately chose _not_ to use swap: large 
hypervisors with local storage.

With swap on the host enabled, all that ended up happening was that local 
IO activity caused idle guest memory to be gradually swapped out. 
Eventually all of the swap space filled up, and the system was exactly 
where it would have been had it not had any swap space configured in the 
first place -- except that it was now _a lot_ slower to migrate those 
swapped-out guests to other hypervisors.

- Michael


Re: [systemd-devel] how to let systemd hibernate start/stop the swap area?

2023-03-29 Thread Christoph Anton Mitterer
Hey Lennart.


On Wed, 2023-03-29 at 16:35 +0200, Lennart Poettering wrote:
> In almost all scenarios you want swap, regardless if little RAM or a
> lot. For specialist cases where you run everything from memory, and
> not even programs are backed by disk there might be exceptions.

Similar to the latter example of yours, one could think of scenarios
with little disk space, where it might be interesting to use a swap
file for hibernation, but have the storage available when the system is
up.


> But
> that#s almost never the case.

IMO that's always hard to say... in WLCG we actually have compute nodes
with little to no disk space but plenty of RAM (or well at least a
certain amount of GB per core). Though we don't need hibernation there.


> It allows the kernel to reclaim anonymous memory, because it can
> write
> it to disk and then use it for other purposes.

Well that's clear, it's just that on my systems (both servers and
workstations) I've never really run into the need to reclaim lots of
anonymous memory.
It was apparently always enough to have cached data reclaimed.


> swap is not an "extra
> on top", that's a complete misunderstanding how modern memory
> management works. By avoiding swap you create artificial
> (i.e. unnecessary) scarcity, and disallow the kernel to use RAM for
> useful purposes because you block it with anonymous memory that might
> never be used. You artificially amplify IO on the file-backed pages
> hence, because those become the only ones that are reclaimable.

Well, I have been operating like that for many years now, and back then
I actually run into trashing more often, so maybe my experiences are
just outdated :D

If things changed, I'll happily try again to run with swap and see how
it turns out for my use cases :-)


> https://chrisdown.name/2018/01/02/in-defence-of-swap.html

Thanks for the pointer.





On Wed, 2023-03-29 at 16:36 +0200, Lennart Poettering wrote:
> Yeah, all requests that go through logind check that.
> 
> You can override the check via an environment variable,
> SYSTEMD_BYPASS_HIBERNATION_MEMORY_CHECK=1 btw, see
> https://systemd.io/ENVIRONMENT/

Nice, that did the trick.
Perfect that it's an env var that takes effect at logind, so one can
easily add it to the env of just that via the unit file, without the
need to fiddle around with bashrc and friends.


Since you're anyway rather against the whole idea,... I assume you
wouldn't want a PR that adds something like that as an example to the
manpages?
Or perhaps somehow a hint that logind is consulted first and thus the
swap area needs to be already there or the check disabled?

Another thing that could be worth to add somewhere is where people are
intended to add other service dependencies for things that should be
done before/after suspend/resume/etc. ... i.e. the respective .target
and not .service.
Might make it easier for people to use it properly :-)


In any case, thanks for your help :-)

Cheers,
Chris.


Re: [systemd-devel] how to let systemd hibernate start/stop the swap area?

2023-03-29 Thread Lennart Poettering
On Mi, 29.03.23 14:07, Christoph Anton Mitterer (cales...@scientia.org) wrote:

> When I use 
>systemd.log_level=debug systemd.log_target=console
> in the kernel parameters, and then do systemctl hibernate during the
> system I get:
> Mar 29 12:04:48 hbt systemd-logind[780]: Got message type=method_call 
> sender=:1.9 destination=org.freedesktop.login1 path=/org/freedesktop/login1 
> interface=org.freedesktop.login1.Manager member=SetWallMessage cookie=2 
> reply_cookie=0 signature=sb error-name=n/a error-message=n/a
> Mar 29 12:04:48 hbt systemd-logind[780]: Sent message type=method_return 
> sender=n/a destination=:1.9 path=n/a interface=n/a member=n/a cookie=61 
> reply_cookie=2 signature=n/a error-name=n/a error-message=n/a
> Mar 29 12:04:48 hbt systemd-logind[780]: Got message type=method_call 
> sender=:1.9 destination=org.freedesktop.login1 path=/org/freedesktop/login1 
> interface=org.freedesktop.login1.Manager member=HibernateWithFlags cookie=3 
> reply_cookie=0 signature=t error-name=n/a error-message=n/a
> Mar 29 12:04:48 hbt systemd-logind[780]: Sleep mode "disk" is supported by 
> the kernel.
> Mar 29 12:04:48 hbt systemd-logind[780]: Disk sleep mode "shutdown" is 
> supported by the kernel.
> Mar 29 12:04:48 hbt systemd-logind[780]: No possible swap partitions or files 
> suitable for hibernation were found in /proc/swaps.
> Mar 29 12:04:48 hbt systemd-logind[780]: Sent message type=error sender=n/a 
> destination=:1.9 path=n/a interface=n/a member=n/a cookie=62 reply_cookie=3 
> signature=s error-name=org.freedesktop.login1.SleepVerbNotSupported 
> error-message=Not enough swap space for hibernation
> Mar 29 12:04:48 hbt systemd-logind[780]: Failed to process message 
> type=method_call sender=:1.9 destination=org.freedesktop.login1 
> path=/org/freedesktop/login1 interface=org.freedesktop.login1.Manager 
> member=HibernateWithFlags cookie=3 reply_cookie=0 signature=t error-name=n/a 
> error-message=n/a: Not enough swap space for hibernation
>
> Does that mean it's the same problem as with the desktop environment?
> I.e. systemdctl first asking logind whether hibernate was available,
> before even starting hibernate.target?

Yeah, all requests that go through logind check that.

You can override the check via an environment variable,
SYSTEMD_BYPASS_HIBERNATION_MEMORY_CHECK=1 btw, see
https://systemd.io/ENVIRONMENT/

Lennart

--
Lennart Poettering, Berlin


Re: [systemd-devel] how to let systemd hibernate start/stop the swap area?

2023-03-29 Thread Lennart Poettering
On Mi, 29.03.23 13:53, Christoph Anton Mitterer (cales...@scientia.org) wrote:

> > > That's a bad idea btw. I'd advise you not to do that: on modern
> > > systems you want swap, since it makes anonymous memory reclaimable.
> > > I
> > > am not sure where you are getting this idea from that swap was
> > > bad.
>
> Well I haven't said it's bad, but I guess it depends on the use case
> any available RAM.

In almost all scenarios you want swap, regardless if little RAM or a
lot. For specialist cases where you run everything from memory, and
not even programs are backed by disk there might be exceptions. But
that#s almost never the case.

> If one has plenty of the latter (e.g. our servers at the university
> have all at least 64GB or more - and even my laptop has) what else than
> giving you a bit more extra on top does swap give you?

It allows the kernel to reclaim anonymous memory, because it can write
it to disk and then use it for other purposes. swap is not an "extra
on top", that's a complete misunderstanding how modern memory
management works. By avoiding swap you create artificial
(i.e. unnecessary) scarcity, and disallow the kernel to use RAM for
useful purposes because you block it with anonymous memory that might
never be used. You artificially amplify IO on the file-backed pages
hence, because those become the only ones that are reclaimable.

> If your memory is limited, like perhaps on IoT, and you rarely (say
> once a day when some extra cron jobs run) need more than physical
> memory is available, then sure... in that case it's good to have some
> relieve of memory pressure.

Nope. Not how this works.

> But if one generally uses more than one has, I feel it would be better
> to run into the oom killer soon.

oomd/PSI looks at memory allocation latencies to determine memory
pressure. Since you disallow anonymous memory to be paged out and thus
increase IO on file backed memory you increase the latencies
unnecessarily, thus making oomd trigger earlier.

> I've looked a bit around at any recommendations, but I couldn't really
> find much credible sources.

Read this for example:

https://chrisdown.name/2018/01/02/in-defence-of-swap.html

It's from 2018, i.e. 5 years ago, but most of it is still
accurate. It's from the facebook people, i.e. the folks who
maintain cgroups/psi/mm stuff, i.e. who really know these things.

If that's not credible enough for you, then I cant't help you.

> Uhm... sorry, I don't quite get hat:
>   systemd-analyze log-level debug
> doesn't seem to be a valid command?

It is. All systemd-based distros from the last 5y or so should have
that command.

> But that would also mean that when the swap was enabled manually (and
> not via the RequiredBy=systemd-hibernate.service) it wouldn't get
> stopped on umount.target?

You can add that manually if you like.

> So I guess that since you're anyway rather against running without swap
> you probably wouldn't accept a feature request that asks for some
> method to override that auto-detection (something like
> AdvertiseHibernate=(auto|always|never) )?

I fail to see the point of the concept these days. Systems where
hibernation should be used, should generally also benefit from swap.

Lennart

--
Lennart Poettering, Berlin


Re: [systemd-devel] how to let systemd hibernate start/stop the swap area?

2023-03-29 Thread Christoph Anton Mitterer
When I use 
   systemd.log_level=debug systemd.log_target=console
in the kernel parameters, and then do systemctl hibernate during the
system I get:
Mar 29 12:04:48 hbt systemd-logind[780]: Got message type=method_call 
sender=:1.9 destination=org.freedesktop.login1 path=/org/freedesktop/login1 
interface=org.freedesktop.login1.Manager member=SetWallMessage cookie=2 
reply_cookie=0 signature=sb error-name=n/a error-message=n/a
Mar 29 12:04:48 hbt systemd-logind[780]: Sent message type=method_return 
sender=n/a destination=:1.9 path=n/a interface=n/a member=n/a cookie=61 
reply_cookie=2 signature=n/a error-name=n/a error-message=n/a
Mar 29 12:04:48 hbt systemd-logind[780]: Got message type=method_call 
sender=:1.9 destination=org.freedesktop.login1 path=/org/freedesktop/login1 
interface=org.freedesktop.login1.Manager member=HibernateWithFlags cookie=3 
reply_cookie=0 signature=t error-name=n/a error-message=n/a
Mar 29 12:04:48 hbt systemd-logind[780]: Sleep mode "disk" is supported by the 
kernel.
Mar 29 12:04:48 hbt systemd-logind[780]: Disk sleep mode "shutdown" is 
supported by the kernel.
Mar 29 12:04:48 hbt systemd-logind[780]: No possible swap partitions or files 
suitable for hibernation were found in /proc/swaps.
Mar 29 12:04:48 hbt systemd-logind[780]: Sent message type=error sender=n/a 
destination=:1.9 path=n/a interface=n/a member=n/a cookie=62 reply_cookie=3 
signature=s error-name=org.freedesktop.login1.SleepVerbNotSupported 
error-message=Not enough swap space for hibernation
Mar 29 12:04:48 hbt systemd-logind[780]: Failed to process message 
type=method_call sender=:1.9 destination=org.freedesktop.login1 
path=/org/freedesktop/login1 interface=org.freedesktop.login1.Manager 
member=HibernateWithFlags cookie=3 reply_cookie=0 signature=t error-name=n/a 
error-message=n/a: Not enough swap space for hibernation

Does that mean it's the same problem as with the desktop environment?
I.e. systemdctl first asking logind whether hibernate was available,
before even starting hibernate.target?


Thanks,
Chris.


Re: [systemd-devel] how to let systemd hibernate start/stop the swap area?

2023-03-29 Thread Christoph Anton Mitterer
Hey.


On Wed, 2023-03-29 at 10:20 +0200, Lennart Poettering wrote:
> > That's a bad idea btw. I'd advise you not to do that: on modern
> > systems you want swap, since it makes anonymous memory reclaimable.
> > I
> > am not sure where you are getting this idea from that swap was
> > bad.

Well I haven't said it's bad, but I guess it depends on the use case
any available RAM.
If one has plenty of the latter (e.g. our servers at the university
have all at least 64GB or more - and even my laptop has) what else than
giving you a bit more extra on top does swap give you?
At the potential cost of the system going into trashing instead, making
it more or less completely unresponsive.

If your memory is limited, like perhaps on IoT, and you rarely (say
once a day when some extra cron jobs run) need more than physical
memory is available, then sure... in that case it's good to have some
relieve of memory pressure.
But if one generally uses more than one has, I feel it would be better
to run into the oom killer soon.

And even if one's scared about some precious process being killed in a
short period of memory pressure... the swap may possibly just shift
that to some later time.


I've looked a bit around at any recommendations, but I couldn't really
find much credible sources.
It rather seemed thought that the older some post/blog/etc. is the more
likely they recommend swap and the more recent they rather recommend
little to none.
And even if it was still generally recommended/useful, looking at the
comments people give (or there was e.g. that survey [0]) there still
simply seems to be quite some fraction of people who wants to run
without.

The aforementioned patchset also seems like a good indicator that there
is some desire to use swap for hibernate only.


And even if you argue that people should have swap, there might still
be valid use cases for enabling certain swap partitions only for
hibernate.
E.g. people might want to use https://ddramdisk.store/ (which is
however volatile) for swapping, but another swap on persistent storage
for hibernate (only).
Or one could want to use such a volatile device as "normal" swap
without encryption, and an extra one for hibernation with encryption.



> > > > That does work, when:
> > > >    # systemctl start systemd-hibernate.service
> > > > but it doesn't when:
> > > >    # systemctl hibernate
> > > > which I don't understand, since I though that would start the
> > > > target,
> > > > which would pull in and thus start the service, which before
> > > > pulls
> > > > in
> > > > starts my swapfile.
> > 
> > Provide debug logs of PID1, i.e. "systemd-analyze log-level debug"
> > right before the hibernation attempt, and then the journal output
> > generated that way.

Uhm... sorry, I don't quite get hat:
  systemd-analyze log-level debug
doesn't seem to be a valid command?

Or do you mean something like:
export SYSTEMD_LOG_LEVEL=debug
systemd-analyze dump
but that doesn't contain anything about "*hibernate*".

Neither does it generate any output in journalctl -f or with
_SYSTEMD_UNIT=systemd-hibernate.service or
_SYSTEMD_UNIT=hibernate.target


> > > > Also, I'm not really sure whether the above is the most
> > > > systemdic
> > > > way... like should I use something like BindsTo= instead of
> > > > RequiredBy=
> > > > + StopWhenUnneeded=true?
> > 
> > Nah, sounds Ok to me.

Thanks.


> > > > Also would it be better to explicitly set
> > > > DefaultDependencies=no?
> > 
> > Probably, yeah, given that systemd-hibernate.service has that set
> > too.

But that would also mean that when the swap was enabled manually (and
not via the RequiredBy=systemd-hibernate.service) it wouldn't get
stopped on umount.target?


> 
> > Yes, logind reports that hibernation is not supported if you have
> > no
> > swap. Desktops ask logind for that.
> > 
> > Frnakly, the idea that we mount a swap partition only for
> > hibernation
> > appears to be a bad idea to me. We should drop it from the TODO
> > list. If a swap partition is good for hibernation it is also good
> > for
> > proper swap operation, and not using it for that makes things worth
> > in
> > almost all ways.

So I guess that since you're anyway rather against running without swap
you probably wouldn't accept a feature request that asks for some
method to override that auto-detection (something like
AdvertiseHibernate=(auto|always|never) )?


Cheers,
Chris.


[0] https://opensource.com/article/19/2/swap-space-poll


Re: [systemd-devel] how to let systemd hibernate start/stop the swap area?

2023-03-29 Thread Lennart Poettering
On Mi, 29.03.23 04:43, Christoph Anton Mitterer (cales...@scientia.org) wrote:

Hi!

> I guess many people nowadays will run without any swap for the normal
> paging use, but only have it for hibernation (at least on laptops).

That's a bad idea btw. I'd advise you not to do that: on modern
systems you want swap, since it makes anonymous memory reclaimable. I
am not sure where you are getting this idea from that swap was
bad. It's *good*. Things like systemd-oomd will even warn if you don't
have swap, as it amplifies memory pressure as swap-less systems can
only swap out file-backed memory instead of anything anonymous.

> That does work, when:
># systemctl start systemd-hibernate.service
> but it doesn't when:
># systemctl hibernate
> which I don't understand, since I though that would start the target,
> which would pull in and thus start the service, which before pulls in
> starts my swapfile.

Provide debug logs of PID1, i.e. "systemd-analyze log-level debug"
right before the hibernation attempt, and then the journal output
generated that way.

> Also, I'm not really sure whether the above is the most systemdic
> way... like should I use something like BindsTo= instead of RequiredBy=
> + StopWhenUnneeded=true?

Nah, sounds Ok to me.

> Also would it be better to explicitly set DefaultDependencies=no?

Probably, yeah, given that systemd-hibernate.service has that set too.

> 2) The whole thing seems to not work with auto-detection from desktop
> environments.
> I use Cinnamon, when clicking the shutdown icon, it shows me Suspend,
> Restart, Cancel, Shotdown.
>
> At least the Suspend seems to be somehow detected via systemd, cause
> when I set AllowSuspend=false in sleep.conf, it disappears.
>
> So I guess systemd thinks hibernation isn't possible, because there's
> now swap active, and tell that to the GUI.
> Guess I'd need some override, for that.

Yes, logind reports that hibernation is not supported if you have no
swap. Desktops ask logind for that.

Frnakly, the idea that we mount a swap partition only for hibernation
appears to be a bad idea to me. We should drop it from the TODO
list. If a swap partition is good for hibernation it is also good for
proper swap operation, and not using it for that makes things worth in
almost all ways.

Lennart

--
Lennart Poettering, Berlin