Re: [systemd-devel] how to let systemd hibernate start/stop the swap area?
On Mon, 3 Apr 2023, Lennart Poettering wrote: > On Sa, 01.04.23 06:16, Michael Chapman (m...@very.puzzling.org) wrote: > > > > Well, in larger environments the goal is typically to saturate all > > > hosts, but not overload them. i.e. maximizing your ROI. No need to > > > fall from one extreme into the other. Today's Linux can actually > > > achieve something like this, if you use it properly. Swap is part of > > > using it "properly". > > > > > > Oversized hw is typically a bad investment. In particular in today's > > > cloud world where costs multiply with every node you have. > > > > If customers have paid for RAM, you don't turn around and given them swap > > instead. That's just plain dishonest. > > This is nonsense. Your VM images are typically backed by disk, no? You > just amplify IO on that. No, you don't... because _exactly the same_ IO is done. Once the swap is full, the existence of swap doesn't change what gets paged in or out on the host side, and it doesn't change which parts of guest RAM gets paged in or out. (And I really _don't want_ guest RAM to be paged in or out... we had sold it as RAM!) > Anyway, you apparently think you know MM better than the fb folks who > wrote the stuff. Good for you then! Since it doesn't look likely that > anyone can convince you otherwise, let's end this dicussion here. I find it very upsetting that you assume I just made all of this up. I did measurements. The results showed that swap made no difference to guest performance. If it made the guests perform better I would have kept it! But yes, if you don't believe me I think it is best that we leave it at that. I honestly can't think of any other way to convince you.
Re: [systemd-devel] how to let systemd hibernate start/stop the swap area?
On Sa, 01.04.23 06:16, Michael Chapman (m...@very.puzzling.org) wrote: > > Well, in larger environments the goal is typically to saturate all > > hosts, but not overload them. i.e. maximizing your ROI. No need to > > fall from one extreme into the other. Today's Linux can actually > > achieve something like this, if you use it properly. Swap is part of > > using it "properly". > > > > Oversized hw is typically a bad investment. In particular in today's > > cloud world where costs multiply with every node you have. > > If customers have paid for RAM, you don't turn around and given them swap > instead. That's just plain dishonest. This is nonsense. Your VM images are typically backed by disk, no? You just amplify IO on that. Anyway, you apparently think you know MM better than the fb folks who wrote the stuff. Good for you then! Since it doesn't look likely that anyone can convince you otherwise, let's end this dicussion here. Lennart -- Lennart Poettering, Berlin
Re: [systemd-devel] how to let systemd hibernate start/stop the swap area?
On Fr, 31.03.23 13:34, Christoph Anton Mitterer (cales...@scientia.org) wrote: > Hey. > > Just for better understanding: > > AFAIU, the main idea of having swap despite enough memory was the > following: > > Unless when processes explicitly release memory (or get stopped), the > kernel can mostly reclaim only cached memory,... but if swap is > available it can also reclaim anonymous memory. > > So the idea is, that processes might have pages that are literally > never used (except for initial loading), yet still kept in memory... so > these permanently eat up physical memory when they cannot be swapped > out. > > And the actual benefit that then comes (even when the memory is enough) > in is that (more) physical memory can be used for caching. > > Right? > Yes, more or less. Except that the term "caching" might not be the best word to use in this context. For example, running a memory mapped ELF executable, where the pages that the codepaths take are paged in as needed probably isn't usually called "cache management". > All this of course at the potential cost, that if one has some > misbehaving application, the system may still go into trashing. > Or is the kernel smart enough to prevent this? Things like systemd-oomd are supposed to detect misbehaving services and apps and shut them down cleanly before they can misbehave too much. Lennart -- Lennart Poettering, Berlin
Re: [systemd-devel] how to let systemd hibernate start/stop the swap area?
On Sat, 1 Apr 2023, Uoti Urpala wrote: > On Sat, 2023-04-01 at 06:16 +1100, Michael Chapman wrote: > > On Fri, 31 Mar 2023, Lennart Poettering wrote: > > [...] > > > Presumably your system mmaps ELF binaries, VM images, and similar > > > stuff into memory. if you don't allow anonymous memory to backed out > > > onto swap, then you basically telling the kernel "please page out > > > my program code out instead". Which is typically a lot worse. > > > > Yes, but my point is that it _doesn't matter_ if SSH or journald or > > whatever is in memory or needs to be paged back in again. It's such a tiny > > fraction of the system's overall workload. > > That contradicts what you said earlier about the system actually > writing a significant amount of data to swap. If, when swap was > enabled, the system wrote a large amount of data to the swap, that > implies there must be a large amount of some other data that it was > able to keep in memory instead. Buffer cache. Often stuff that the guests never ended up needing again, or at least could survive the penalty of having it read back off disk again. Of course the host kernel didn't know that, since it cannot predict the future. All it knows is that IO is happening, and there are idle pages in the guest. Of course it's going to steadily push those idle pages out to swap. And the graphs I had at the time showed a very nice linearly- increasing swap usage -- until the swap was full. > Linux should not write all information > from memory to swap just to leave the memory empty and without any > useful content - everything written to swap should correspond to > something else kept in memory. > > So if you say that the swap use was overall harmful for behavior, > claiming that the *size* of other data kept in memory was too small to > matter doesn't really make any sense. If the swap use was significant, > then it should have kept a significant amount of some other data in > memory, either for the main OS or for the guests. The "harmful behaviour" was the fact that _when_ those guests needed to be swapped in, that was unpleasantly slow. The existence of swap had little to no effect on the running behaviour of the guests themselves -- as I keep saying, when you have enough buffer cache on the host, having "a bit more" because you've got swap as well does very little. You're already in the long tail of your performance graphs. Can I make this any simpler? How about this: * Whether swap was there or not had _no_ measurable effect on the guests' performance. * Having swap meant there was a large swap-in penalty in certain circumstances. (Migration was one of them. "Rebooting a Windows VM" was another, since Windows apparently likes to zero all of its RAM.) Does it make sense now?
Re: [systemd-devel] how to let systemd hibernate start/stop the swap area?
On Sat, 2023-04-01 at 06:16 +1100, Michael Chapman wrote: > On Fri, 31 Mar 2023, Lennart Poettering wrote: > [...] > > Presumably your system mmaps ELF binaries, VM images, and similar > > stuff into memory. if you don't allow anonymous memory to backed out > > onto swap, then you basically telling the kernel "please page out > > my program code out instead". Which is typically a lot worse. > > Yes, but my point is that it _doesn't matter_ if SSH or journald or > whatever is in memory or needs to be paged back in again. It's such a tiny > fraction of the system's overall workload. That contradicts what you said earlier about the system actually writing a significant amount of data to swap. If, when swap was enabled, the system wrote a large amount of data to the swap, that implies there must be a large amount of some other data that it was able to keep in memory instead. Linux should not write all information from memory to swap just to leave the memory empty and without any useful content - everything written to swap should correspond to something else kept in memory. So if you say that the swap use was overall harmful for behavior, claiming that the *size* of other data kept in memory was too small to matter doesn't really make any sense. If the swap use was significant, then it should have kept a significant amount of some other data in memory, either for the main OS or for the guests.
Re: [systemd-devel] how to let systemd hibernate start/stop the swap area?
On Fri, 31 Mar 2023, Lennart Poettering wrote: [...] > Presumably your system mmaps ELF binaries, VM images, and similar > stuff into memory. if you don't allow anonymous memory to backed out > onto swap, then you basically telling the kernel "please page out > my program code out instead". Which is typically a lot worse. Yes, but my point is that it _doesn't matter_ if SSH or journald or whatever is in memory or needs to be paged back in again. It's such a tiny fraction of the system's overall workload. This is why Luca's suggestion of using memory.swap.max=0 on all the QEMU processes isn't measurably better than just not using swap at all. Either 99% of the system isn't using swap, or 100% of it isn't using swap. > That's why I am saying that yeah, if you want zero IO then that's OK, > but in that case you want *neither* anonymous memory being backed by > disk swap *nor* file-backed memory backed by disk file systems. But > you made the strange choice of saying "IO by file-backed memory is > good", but "IO by anonymous memory" is bad, and then allow the former > and forbid the latter. > > hence my question: do you run your OS from an in-memory file system of > some kind? because if not you just shift around what gets paged out, > and because you make the pool of reclaimable memory smaller you > increase IO. In practice, everything that needed to run on the host was either already in memory or could be paged in quickly. Given the sum total of that was only a GB or so, that's not surprising. [...] > Well, in larger environments the goal is typically to saturate all > hosts, but not overload them. i.e. maximizing your ROI. No need to > fall from one extreme into the other. Today's Linux can actually > achieve something like this, if you use it properly. Swap is part of > using it "properly". > > Oversized hw is typically a bad investment. In particular in today's > cloud world where costs multiply with every node you have. If customers have paid for RAM, you don't turn around and given them swap instead. That's just plain dishonest. So yes, the system _does_ need to have more physical memory than the sum of the guests' virtual memory. Then you add on a bit more so you've got some room for buffers and page cache, since (at least in my case) IO was local. That's the size of the server you need.
Re: [systemd-devel] how to let systemd hibernate start/stop the swap area?
On Fri, 2023-03-31 at 12:50 +0200, Lennart Poettering wrote: > On Do, 30.03.23 13:16, Phillip Susi (ph...@thesusis.net) wrote: > > > > > Lennart Poettering writes: > > > > > oomd/PSI looks at memory allocation latencies to determine memory > > > pressure. Since you disallow anonymous memory to be paged out and thus > > > increase IO on file backed memory you increase the latencies > > > unnecessarily, thus making oomd trigger earlier. > > > > Did this get changed in the last few years? Because I'm sure it used to > > be based on the total commit limit, and so OOM wouldn't start killing > > until your swap was full, which didn't happen until the system was > > thrashing itself to uselessness for 20 minutes already. > > oomd becomes active on two distinct triggers: > > This one: > > https://github.com/systemd/systemd/blob/main/src/oom/oomd-manager.c#L383 > > and this one: > > https://github.com/systemd/systemd/blob/main/src/oom/oomd-manager.c#L486 > > The latter is PSI. We ended up having to disable systemd-oomd on our autobuilder/CI systems since it got upset when a single process tree was using the majority of the system resources (from memory greater than 90%?). On a CI system, we'd expect the majority of the system resources to be used by that single user/process tree so this was a bit annoying and we ended up disabling it. Everything was fine since so it was a false positive. I'm having trouble mapping that behaviour to the above two triggers... Cheers, Richard
Re: [systemd-devel] how to let systemd hibernate start/stop the swap area?
On Fr, 31.03.23 21:54, Michael Chapman (m...@very.puzzling.org) wrote: > > because otherwise you just remove the latencies from anonymous memory > > but you amplify the latencies on file-backed memory. Which is overall > > worse, not better. > > The host isn't doing much IO. Just a bit of logging really. IO is not just writing stuff. If you run some OS a lot more IO is generated by the fact that ELF binaries are mapped into memory and then paged in as they run than by generating a bit of log entries. By saying "hey, never page out anonymous memory!" to the kernel (by not having swap), you basically say "but please page out file-backed memory even more, please please, go ahead, now". > How would the > existence of swap effect that? Is it really so much better to be able to > log messages just that little bit faster, but you've now got to wait for > `sshd` to swap back in whenever you SSH to the system? Presumably your system mmaps ELF binaries, VM images, and similar stuff into memory. if you don't allow anonymous memory to backed out onto swap, then you basically telling the kernel "please page out my program code out instead". Which is typically a lot worse. That's why I am saying that yeah, if you want zero IO then that's OK, but in that case you want *neither* anonymous memory being backed by disk swap *nor* file-backed memory backed by disk file systems. But you made the strange choice of saying "IO by file-backed memory is good", but "IO by anonymous memory" is bad, and then allow the former and forbid the latter. hence my question: do you run your OS from an in-memory file system of some kind? because if not you just shift around what gets paged out, and because you make the pool of reclaimable memory smaller you increase IO. > > > I know this works because I have literally done it on many, many > > > hypervisors for over a decade. > > > > I mean, you have a point: if you run on idle machines where hardware > > is so massively oversized for the job you are doing, you can operate > > really nicely without swap. No doubt. But that's kinda > > wasteful. Resource-management through oversized hw is certainly a way to > > solve problems, no doubt. > > The alternative would be to _overprovision_ the server -- i.e. put more > VMs on it than it can support. That would just be stupid. Well, in larger environments the goal is typically to saturate all hosts, but not overload them. i.e. maximizing your ROI. No need to fall from one extreme into the other. Today's Linux can actually achieve something like this, if you use it properly. Swap is part of using it "properly". Oversized hw is typically a bad investment. In particular in today's cloud world where costs multiply with every node you have. Lennart -- Lennart Poettering, Berlin
Re: [systemd-devel] how to let systemd hibernate start/stop the swap area?
Hey. Just for better understanding: AFAIU, the main idea of having swap despite enough memory was the following: Unless when processes explicitly release memory (or get stopped), the kernel can mostly reclaim only cached memory,... but if swap is available it can also reclaim anonymous memory. So the idea is, that processes might have pages that are literally never used (except for initial loading), yet still kept in memory... so these permanently eat up physical memory when they cannot be swapped out. And the actual benefit that then comes (even when the memory is enough) in is that (more) physical memory can be used for caching. Right? Or were there any other general ways by how this improves performance? All this of course at the potential cost, that if one has some misbehaving application, the system may still go into trashing. Or is the kernel smart enough to prevent this? thanks, Chris.
Re: [systemd-devel] how to let systemd hibernate start/stop the swap area?
On Fri, 31 Mar 2023, Lennart Poettering wrote: > On Fr, 31.03.23 07:57, Michael Chapman (m...@very.puzzling.org) wrote: > > > On Fri, 31 Mar 2023, Luca Boccassi wrote: > > [...] > > > No, it does not make "little difference", there are entire subsystems > > > which are much worse off, if not completely useless, without swap. > > > Post-cgroupsv2 memory controller things are considerably different on > > > this front, and old "common wisdom" no longer applies. > > > > What are some examples here? > > > > What specifically is the difference between: > > > > * swap does not exist at all; > > * swap is full of data that will not be swapped in for weeks or > > months; > > The big difference is that the RAM that became available because the > unused stuff was swapped out has been applied to better uses, > i.e. keep more frequently used stuff around, improving performance of > the often used stuff, at the price of degrading peformance of the > apparently never used stuff. Overall win! Honestly, I feel that I've covered this already... but I'll try again. It is only a win if that actually results in better performance! If all you've done is swapped out a whole lot of data, but the rest of the system still has the same performance, you're _worse_ off.
Re: [systemd-devel] how to let systemd hibernate start/stop the swap area?
On Fri, 31 Mar 2023, Lennart Poettering wrote: > On Fr, 31.03.23 18:24, Michael Chapman (m...@very.puzzling.org) wrote: > > > On Fri, 31 Mar 2023, Barry wrote: > > [...] > > > If you want to run in ram only then you must turn off the kernel > > > overcommit. > > > Have you done that? If not then you risk processes getting SEGV signals. > > > > Seriously. It's almost as if nobody here is actually reading anything of > > what I've written! > > > > EVERYTHING fits in RAM. The non-guest processes total perhaps a GB in > > total. The guest processes total maybe 200 GB in total. The server has > > more RAM than all of that. > > I presume you are also running the OS itself from RAM then? i.e. your > rootfs is not backed by disk, but by some in-memory fs, or a loopback > on a memfd or so? > > because otherwise you just remove the latencies from anonymous memory > but you amplify the latencies on file-backed memory. Which is overall > worse, not better. The host isn't doing much IO. Just a bit of logging really. How would the existence of swap effect that? Is it really so much better to be able to log messages just that little bit faster, but you've now got to wait for `sshd` to swap back in whenever you SSH to the system? > > I know this works because I have literally done it on many, many > > hypervisors for over a decade. > > I mean, you have a point: if you run on idle machines where hardware > is so massively oversized for the job you are doing, you can operate > really nicely without swap. No doubt. But that's kinda > wasteful. Resource-management through oversized hw is certainly a way to > solve problems, no doubt. The alternative would be to _overprovision_ the server -- i.e. put more VMs on it than it can support. That would just be stupid. Sticking with the example above, if the guests' RAM total 200 GB are you really suggesting that only having, say, a 204 GB server and making up for the lack of memory by adding swap would actually be better? Of course it wouldn't! The VMs certainly weren't idle. Some of them had a fair bit of idle RAM, yes, but that's not entirely unusual. Plus, I don't get to choose what the VMs run, my customers do.
Re: [systemd-devel] how to let systemd hibernate start/stop the swap area?
On Do, 30.03.23 13:16, Phillip Susi (ph...@thesusis.net) wrote: > > Lennart Poettering writes: > > > oomd/PSI looks at memory allocation latencies to determine memory > > pressure. Since you disallow anonymous memory to be paged out and thus > > increase IO on file backed memory you increase the latencies > > unnecessarily, thus making oomd trigger earlier. > > Did this get changed in the last few years? Because I'm sure it used to > be based on the total commit limit, and so OOM wouldn't start killing > until your swap was full, which didn't happen until the system was > thrashing itself to uselessness for 20 minutes already. oomd becomes active on two distinct triggers: This one: https://github.com/systemd/systemd/blob/main/src/oom/oomd-manager.c#L383 and this one: https://github.com/systemd/systemd/blob/main/src/oom/oomd-manager.c#L486 The latter is PSI. > What happens if you use zswap? Will hibernation try to save things to > there instead of a real disk swap? It might be nice to have zswap for > normal use and the on disk swap for hibernate. our sleep code does not consider zram devices for hibernation. Lennart -- Lennart Poettering, Berlin
Re: [systemd-devel] how to let systemd hibernate start/stop the swap area?
On Fr, 31.03.23 07:57, Michael Chapman (m...@very.puzzling.org) wrote: > On Fri, 31 Mar 2023, Luca Boccassi wrote: > [...] > > No, it does not make "little difference", there are entire subsystems > > which are much worse off, if not completely useless, without swap. > > Post-cgroupsv2 memory controller things are considerably different on > > this front, and old "common wisdom" no longer applies. > > What are some examples here? > > What specifically is the difference between: > > * swap does not exist at all; > * swap is full of data that will not be swapped in for weeks or > months; The big difference is that the RAM that became available because the unused stuff was swapped out has been applied to better uses, i.e. keep more frequently used stuff around, improving performance of the often used stuff, at the price of degrading peformance of the apparently never used stuff. Overall win! Lennart -- Lennart Poettering, Berlin
Re: [systemd-devel] how to let systemd hibernate start/stop the swap area?
On Fr, 31.03.23 18:24, Michael Chapman (m...@very.puzzling.org) wrote: > On Fri, 31 Mar 2023, Barry wrote: > [...] > > If you want to run in ram only then you must turn off the kernel overcommit. > > Have you done that? If not then you risk processes getting SEGV signals. > > Seriously. It's almost as if nobody here is actually reading anything of > what I've written! > > EVERYTHING fits in RAM. The non-guest processes total perhaps a GB in > total. The guest processes total maybe 200 GB in total. The server has > more RAM than all of that. I presume you are also running the OS itself from RAM then? i.e. your rootfs is not backed by disk, but by some in-memory fs, or a loopback on a memfd or so? because otherwise you just remove the latencies from anonymous memory but you amplify the latencies on file-backed memory. Which is overall worse, not better. > I know this works because I have literally done it on many, many > hypervisors for over a decade. I mean, you have a point: if you run on idle machines where hardware is so massively oversized for the job you are doing, you can operate really nicely without swap. No doubt. But that's kinda wasteful. Resource-management through oversized hw is certainly a way to solve problems, no doubt. Lennart -- Lennart Poettering, Berlin
Re: [systemd-devel] how to let systemd hibernate start/stop the swap area?
On Fri, 31 Mar 2023, Lennart Poettering wrote: > On Do, 30.03.23 18:56, Michael Chapman (m...@very.puzzling.org) wrote: > > > On Thu, 30 Mar 2023, Lennart Poettering wrote: > > > On Mi, 29.03.23 13:53, Christoph Anton Mitterer (cales...@scientia.org) > > > wrote: > > > > > > > > > That's a bad idea btw. I'd advise you not to do that: on modern > > > > > > systems you want swap, since it makes anonymous memory reclaimable. > > > > > > I > > > > > > am not sure where you are getting this idea from that swap was > > > > > > bad. > > > > > > > > Well I haven't said it's bad, but I guess it depends on the use case > > > > any available RAM. > > > > > > In almost all scenarios you want swap, regardless if little RAM or a > > > lot. For specialist cases where you run everything from memory, and > > > not even programs are backed by disk there might be exceptions. But > > > that#s almost never the case. > > > > One specific case where I deliberately chose _not_ to use swap: large > > hypervisors with local storage. > > > > With swap on the host enabled, all that ended up happening was that local > > IO activity caused idle guest memory to be gradually swapped out. > > Eventually all of the swap space filled up, and the system was exactly > > where it would have been had it not had any swap space configured in the > > first place -- except that it was now _a lot_ slower to migrate those > > swapped-out guests to other hypervisors. > > Linux will swap out stuff only if it has better uses for the RAM. So > yeah, apparently your VMs where mostly idle, and the RAM was better > used for other stuff, and ultimately helped speed up things for that > other more frequently used stuff. Which is an overall win, not a loss. > > If the key requirement you have to make VMs migrate quickly, then > yeah, then allowing them to be written to disk is of course a > problem. But frankly, if the ability to migrate VMs quickly is your > top priority and general performance irrelevant, then you might have > weird priorities? Also, are you sure your network is faster than your > local disk? Certainly faster than the swap-in path! 10 GigE networking really helps. Migrating VMs quickly was not a "key requirement" at all. But it was important, and not having swap meant that it could be achieved _without_ causing any other problems. Think about it: instead of 50 GB of RAM usable as buffer and page cache, let's say I had added swap and allowed that to increase to 100 GB. Would that really make much of a difference to IO performance in guests? Probably not. Sure, I could _engineer_ a test where it made a difference, but in _real-world_ usage it doesn't change things too much. > generally though: i am not doubting that sometimes latency matters for > certain jobs, and paging stuff back in is slow and thus makes > latencies worse. But the way to address that is not turn of swap for > everything, but just for the jobs where the latency means, via the > appropriate cgroup settings. I get that. But why would I spend time doing that rather than just hitting the big `swapoff` button, when that effectively yields the same result? The only difference would be about a GB: the size of all the processes that weren't in guests. > The thing is, anonymous memory is just > one kind of memory, and if you turn off swap then your force that to > remain in RAM – but at the same time you still allow file-based stuff > to be reclaimed so that it must be reread later from disk. If you use > the right resource management settings you have much better control on > that, too, and can comprehensively solve the issue, and get the > latencies you want. > > Or to turn this around: if you are concerned about the latencies swap > is supposed to "introduce", but you do not run your whole OS from an > in-memory image too, then you are doing things wrong and not actually > solving what you want to solve. > > Lennart > > -- > Lennart Poettering, Berlin > >
Re: [systemd-devel] how to let systemd hibernate start/stop the swap area?
On Do, 30.03.23 18:56, Michael Chapman (m...@very.puzzling.org) wrote: > On Thu, 30 Mar 2023, Lennart Poettering wrote: > > On Mi, 29.03.23 13:53, Christoph Anton Mitterer (cales...@scientia.org) > > wrote: > > > > > > > That's a bad idea btw. I'd advise you not to do that: on modern > > > > > systems you want swap, since it makes anonymous memory reclaimable. > > > > > I > > > > > am not sure where you are getting this idea from that swap was > > > > > bad. > > > > > > Well I haven't said it's bad, but I guess it depends on the use case > > > any available RAM. > > > > In almost all scenarios you want swap, regardless if little RAM or a > > lot. For specialist cases where you run everything from memory, and > > not even programs are backed by disk there might be exceptions. But > > that#s almost never the case. > > One specific case where I deliberately chose _not_ to use swap: large > hypervisors with local storage. > > With swap on the host enabled, all that ended up happening was that local > IO activity caused idle guest memory to be gradually swapped out. > Eventually all of the swap space filled up, and the system was exactly > where it would have been had it not had any swap space configured in the > first place -- except that it was now _a lot_ slower to migrate those > swapped-out guests to other hypervisors. Linux will swap out stuff only if it has better uses for the RAM. So yeah, apparently your VMs where mostly idle, and the RAM was better used for other stuff, and ultimately helped speed up things for that other more frequently used stuff. Which is an overall win, not a loss. If the key requirement you have to make VMs migrate quickly, then yeah, then allowing them to be written to disk is of course a problem. But frankly, if the ability to migrate VMs quickly is your top priority and general performance irrelevant, then you might have weird priorities? Also, are you sure your network is faster than your local disk? generally though: i am not doubting that sometimes latency matters for certain jobs, and paging stuff back in is slow and thus makes latencies worse. But the way to address that is not turn of swap for everything, but just for the jobs where the latency means, via the appropriate cgroup settings. The thing is, anonymous memory is just one kind of memory, and if you turn off swap then your force that to remain in RAM – but at the same time you still allow file-based stuff to be reclaimed so that it must be reread later from disk. If you use the right resource management settings you have much better control on that, too, and can comprehensively solve the issue, and get the latencies you want. Or to turn this around: if you are concerned about the latencies swap is supposed to "introduce", but you do not run your whole OS from an in-memory image too, then you are doing things wrong and not actually solving what you want to solve. Lennart -- Lennart Poettering, Berlin
Re: [systemd-devel] how to let systemd hibernate start/stop the swap area?
On Do, 30.03.23 01:39, Christoph Anton Mitterer (cales...@scientia.org) wrote: > Well that's clear, it's just that on my systems (both servers and > workstations) I've never really run into the need to reclaim lots of > anonymous memory. It's the Linux kernel that reclaims memory for you. You don't need to reclaim that yourself, personally, you know. The thing is simply: RAM is best used for stuff that is actually accessed. If you prohibit that you just make the overall behaviour of the system worse. The kernel is pretty good at figuring out what is needed in RAM and what is not. Except of course if you don't let it, and force it to keep useless stuff in memory. > Since you're anyway rather against the whole idea,... I assume you > wouldn't want a PR that adds something like that as an example to the > manpages? No, these exotic env vars are on purpose not documented in the man page, but separately in that markdown doc, since they come with a lower stability guarantee and are a bit icky. People shouldn#t really use them in the general case, they exist for local hacks only, and local hacks have no place in the main documentation that the man pages are. > Might make it easier for people to use it properly :-) It's a bad idea to do what you are doing. I don't think we need to make Lennart -- Lennart Poettering, Berlin
Re: [systemd-devel] how to let systemd hibernate start/stop the swap area?
On Fri, 31 Mar 2023, Tomasz Torcz wrote: > On Fri, Mar 31, 2023 at 06:24:09PM +1100, Michael Chapman wrote: > > On Fri, 31 Mar 2023, Barry wrote: > > [...] > > > If you want to run in ram only then you must turn off the kernel > > > overcommit. > > > Have you done that? If not then you risk processes getting SEGV signals. > > > > Seriously. It's almost as if nobody here is actually reading anything of > > what I've written! > > > > EVERYTHING fits in RAM. The non-guest processes total perhaps a GB in > > total. The guest processes total maybe 200 GB in total. The server has > > more RAM than all of that. > > Your situation seems to be special. People in this thread seem to be > focused on generic Linux computer use-case. And that was entirely my point. Lennart had mentioned one special case. I was just suggesting another one. I wasn't expecting everyone to pile on and say "no, your special case is WRONG, and you're a bad person for even thinking about it!" :-p
Re: [systemd-devel] how to let systemd hibernate start/stop the swap area?
On Fri, Mar 31, 2023 at 06:24:09PM +1100, Michael Chapman wrote: > On Fri, 31 Mar 2023, Barry wrote: > [...] > > If you want to run in ram only then you must turn off the kernel overcommit. > > Have you done that? If not then you risk processes getting SEGV signals. > > Seriously. It's almost as if nobody here is actually reading anything of > what I've written! > > EVERYTHING fits in RAM. The non-guest processes total perhaps a GB in > total. The guest processes total maybe 200 GB in total. The server has > more RAM than all of that. Your situation seems to be special. People in this thread seem to be focused on generic Linux computer use-case. > I know this works because I have literally done it on many, many > hypervisors for over a decade. On the other hand, kernel 4.0, which greatly changed how swap works*, was released half a decade ago. Maybe it's time to revisit assumptions? * according to Chris Down's blog note linked by Lennart at the beginning of the thread. -- Tomasz Torcz 72->| 80->| to...@pipebreaker.pl 72->| 80->|
Re: [systemd-devel] how to let systemd hibernate start/stop the swap area?
On Fri, 31 Mar 2023, Barry wrote: [...] > If you want to run in ram only then you must turn off the kernel overcommit. > Have you done that? If not then you risk processes getting SEGV signals. Seriously. It's almost as if nobody here is actually reading anything of what I've written! EVERYTHING fits in RAM. The non-guest processes total perhaps a GB in total. The guest processes total maybe 200 GB in total. The server has more RAM than all of that. I know this works because I have literally done it on many, many hypervisors for over a decade.
Re: [systemd-devel] how to let systemd hibernate start/stop the swap area?
> On 31 Mar 2023, at 00:51, Michael Chapman wrote: > > On Fri, 31 Mar 2023, Phillip Susi wrote: >> >> Michael Chapman writes: >> >>> What specifically is the difference between: >>> >>> * swap does not exist at all; >>> * swap is full of data that will not be swapped in for weeks or months; >> >> That's the wrong question. > > Nevertheless it was the question I was faced with. I had servers with a > huge amount of memory, a fair bit of swap, and ALL of that swap filled > with stuff that would need to be entirely swapped back in at some point at > a moments notice. > > The solution was simple: turn off swap. Now there was no "swap everything > back in" penalty, and since there was plenty of RAM anyway the change had > little impact on the behaviour of the rest of the system. If you want to run in ram only then you must turn off the kernel overcommit. Have you done that? If not then you risk processes getting SEGV signals. There is a lot of moving parts that affect the robustness of a big server. Swap is one of them that is important to allow efficient use of all the hardware resources. I work on servers with 400G of ram, but it is all used. Swap is a critical part of tuning the performance with the network heavy work load that disk I/O impacts. Barry >
Re: [systemd-devel] how to let systemd hibernate start/stop the swap area?
On Fri, 31 Mar 2023, Phillip Susi wrote: > > Michael Chapman writes: > > > What specifically is the difference between: > > > > * swap does not exist at all; > > * swap is full of data that will not be swapped in for weeks or months; > > That's the wrong question. Nevertheless it was the question I was faced with. I had servers with a huge amount of memory, a fair bit of swap, and ALL of that swap filled with stuff that would need to be entirely swapped back in at some point at a moments notice. The solution was simple: turn off swap. Now there was no "swap everything back in" penalty, and since there was plenty of RAM anyway the change had little impact on the behaviour of the rest of the system.
Re: [systemd-devel] how to let systemd hibernate start/stop the swap area?
Michael Chapman writes: > What specifically is the difference between: > > * swap does not exist at all; > * swap is full of data that will not be swapped in for weeks or months; That's the wrong question. The question is, what is the difference between having NO swap, and having some swap that you don't use much of? The answer to that is that there will be a non zero amount of anonymous memory allocated to processes that hardly ever touch it, and that can be tossed out to swap to provide more memory to use for, if nothing else, caching files that ARE being accessed. Now that amount may not be much if you usually have plenty of free ram, but it won't be zero. I too have long gone without a swap partition because the small benefit of having a little more ram to cache files did not justify the risk of going into thrashing mode when some process went haywire, but if that problem has been solved, and you want a swap partition for hibernation anyhow, then you may as well keep it mounted all the time since unmounting it when you aren't about to hibernate costs *something* and gains *nothing*.
Re: [systemd-devel] how to let systemd hibernate start/stop the swap area?
On Fri, 31 Mar 2023, Luca Boccassi wrote: [...] > No, it does not make "little difference", there are entire subsystems > which are much worse off, if not completely useless, without swap. > Post-cgroupsv2 memory controller things are considerably different on > this front, and old "common wisdom" no longer applies. What are some examples here? What specifically is the difference between: * swap does not exist at all; * swap is full of data that will not be swapped in for weeks or months; ? Either way, nothing more can be swapped out, and nothing will get swapped in. If everything fits in RAM, as far as I can see the only thing allowing "non-guest processes" to be swapped out is that I'd get slightly more available RAM for buffers and cache. But as I noted in the other thread, I've already got enough of that! So what advantage would there be to me to enable swap? I started off this thread with a disadvantage, so there would have to be a *big* advantage to counter that. I am well aware that "you should always have swap" is good general advice for most users. But it's important to remember there are exceptions to it! Lennart suggested one such situation is where you're running everything off RAM anyway. I suggest another such situation is where you have sufficient RAM that your entire workload comfortably fits within it.
Re: [systemd-devel] how to let systemd hibernate start/stop the swap area?
Lennart Poettering writes: > oomd/PSI looks at memory allocation latencies to determine memory > pressure. Since you disallow anonymous memory to be paged out and thus > increase IO on file backed memory you increase the latencies > unnecessarily, thus making oomd trigger earlier. Did this get changed in the last few years? Because I'm sure it used to be based on the total commit limit, and so OOM wouldn't start killing until your swap was full, which didn't happen until the system was thrashing itself to uselessness for 20 minutes already. If this has been fixed then I guess it's time for me to start using swap again. What happens if you use zswap? Will hibernation try to save things to there instead of a real disk swap? It might be nice to have zswap for normal use and the on disk swap for hibernate.
Re: [systemd-devel] how to let systemd hibernate start/stop the swap area?
On Thu, 30 Mar 2023 at 11:09, Michael Chapman wrote: > > On Thu, 30 Mar 2023, Luca Boccassi wrote: > > On Thu, 30 Mar 2023 at 10:15, Michael Chapman > > wrote: > > > > > > On Thu, 30 Mar 2023, Lennart Poettering wrote: > > > > On Mi, 29.03.23 13:53, Christoph Anton Mitterer (cales...@scientia.org) > > > > wrote: > > > > > > > > > > > That's a bad idea btw. I'd advise you not to do that: on modern > > > > > > > systems you want swap, since it makes anonymous memory > > > > > > > reclaimable. > > > > > > > I > > > > > > > am not sure where you are getting this idea from that swap was > > > > > > > bad. > > > > > > > > > > Well I haven't said it's bad, but I guess it depends on the use case > > > > > any available RAM. > > > > > > > > In almost all scenarios you want swap, regardless if little RAM or a > > > > lot. For specialist cases where you run everything from memory, and > > > > not even programs are backed by disk there might be exceptions. But > > > > that#s almost never the case. > > > > > > One specific case where I deliberately chose _not_ to use swap: large > > > hypervisors with local storage. > > > > > > With swap on the host enabled, all that ended up happening was that local > > > IO activity caused idle guest memory to be gradually swapped out. > > > Eventually all of the swap space filled up, and the system was exactly > > > where it would have been had it not had any swap space configured in the > > > first place -- except that it was now _a lot_ slower to migrate those > > > swapped-out guests to other hypervisors. > > > > > > - Michael > > > > The solution there is to ensure the cgroup configuration for the > > slices where the guests run have memory.swap.max=0, rather than > > disabling it for the whole system. > > Perhaps, but given the rest of processes on the system need just a few > hundred MB max, and the server has hundreds of GB of RAM, it really makes > little difference. Turning off swap altogether is plain _simpler_. No, it does not make "little difference", there are entire subsystems which are much worse off, if not completely useless, without swap. Post-cgroupsv2 memory controller things are considerably different on this front, and old "common wisdom" no longer applies.
Re: [systemd-devel] how to let systemd hibernate start/stop the swap area?
On Thu, 30 Mar 2023, Greg KH wrote: > On Thu, Mar 30, 2023 at 09:09:19PM +1100, Michael Chapman wrote: > > On Thu, 30 Mar 2023, Luca Boccassi wrote: > > > On Thu, 30 Mar 2023 at 10:15, Michael Chapman > > > wrote: > > > > > > > > On Thu, 30 Mar 2023, Lennart Poettering wrote: > > > > > On Mi, 29.03.23 13:53, Christoph Anton Mitterer > > > > > (cales...@scientia.org) wrote: > > > > > > > > > > > > > That's a bad idea btw. I'd advise you not to do that: on modern > > > > > > > > systems you want swap, since it makes anonymous memory > > > > > > > > reclaimable. > > > > > > > > I > > > > > > > > am not sure where you are getting this idea from that swap was > > > > > > > > bad. > > > > > > > > > > > > Well I haven't said it's bad, but I guess it depends on the use case > > > > > > any available RAM. > > > > > > > > > > In almost all scenarios you want swap, regardless if little RAM or a > > > > > lot. For specialist cases where you run everything from memory, and > > > > > not even programs are backed by disk there might be exceptions. But > > > > > that#s almost never the case. > > > > > > > > One specific case where I deliberately chose _not_ to use swap: large > > > > hypervisors with local storage. > > > > > > > > With swap on the host enabled, all that ended up happening was that > > > > local > > > > IO activity caused idle guest memory to be gradually swapped out. > > > > Eventually all of the swap space filled up, and the system was exactly > > > > where it would have been had it not had any swap space configured in the > > > > first place -- except that it was now _a lot_ slower to migrate those > > > > swapped-out guests to other hypervisors. > > > > > > > > - Michael > > > > > > The solution there is to ensure the cgroup configuration for the > > > slices where the guests run have memory.swap.max=0, rather than > > > disabling it for the whole system. > > > > Perhaps, but given the rest of processes on the system need just a few > > hundred MB max, and the server has hundreds of GB of RAM, it really makes > > little difference. Turning off swap altogether is plain _simpler_. > > So you penalize the runtime performance of guests for the infrequent > migration delay? Sounds like a bad trade-off for any real workload > those guests are doing. Shouldn't the goal of the system be to solve > the problem the guests are trying to solve instead of being optimized > for the infrequent administration tasks? Err... how is ensuring the guests actually stay in memory "penalising their runtime performance"? If anything, it's exactly the opposite! Note that the hypervisor wouldn't have been overprovisioned. If it were, say, a 256 GB server, I might only put 200 GB of guests on it. That still leaves around 50 GB for page cache, which is more than enough! Adding a few GB of swap to the system would hardly make a difference.
Re: [systemd-devel] how to let systemd hibernate start/stop the swap area?
On Thu, Mar 30, 2023 at 09:09:19PM +1100, Michael Chapman wrote: > On Thu, 30 Mar 2023, Luca Boccassi wrote: > > On Thu, 30 Mar 2023 at 10:15, Michael Chapman > > wrote: > > > > > > On Thu, 30 Mar 2023, Lennart Poettering wrote: > > > > On Mi, 29.03.23 13:53, Christoph Anton Mitterer (cales...@scientia.org) > > > > wrote: > > > > > > > > > > > That's a bad idea btw. I'd advise you not to do that: on modern > > > > > > > systems you want swap, since it makes anonymous memory > > > > > > > reclaimable. > > > > > > > I > > > > > > > am not sure where you are getting this idea from that swap was > > > > > > > bad. > > > > > > > > > > Well I haven't said it's bad, but I guess it depends on the use case > > > > > any available RAM. > > > > > > > > In almost all scenarios you want swap, regardless if little RAM or a > > > > lot. For specialist cases where you run everything from memory, and > > > > not even programs are backed by disk there might be exceptions. But > > > > that#s almost never the case. > > > > > > One specific case where I deliberately chose _not_ to use swap: large > > > hypervisors with local storage. > > > > > > With swap on the host enabled, all that ended up happening was that local > > > IO activity caused idle guest memory to be gradually swapped out. > > > Eventually all of the swap space filled up, and the system was exactly > > > where it would have been had it not had any swap space configured in the > > > first place -- except that it was now _a lot_ slower to migrate those > > > swapped-out guests to other hypervisors. > > > > > > - Michael > > > > The solution there is to ensure the cgroup configuration for the > > slices where the guests run have memory.swap.max=0, rather than > > disabling it for the whole system. > > Perhaps, but given the rest of processes on the system need just a few > hundred MB max, and the server has hundreds of GB of RAM, it really makes > little difference. Turning off swap altogether is plain _simpler_. So you penalize the runtime performance of guests for the infrequent migration delay? Sounds like a bad trade-off for any real workload those guests are doing. Shouldn't the goal of the system be to solve the problem the guests are trying to solve instead of being optimized for the infrequent administration tasks? good luck! greg k-h
Re: [systemd-devel] how to let systemd hibernate start/stop the swap area?
On Thu, 30 Mar 2023, Luca Boccassi wrote: > On Thu, 30 Mar 2023 at 10:15, Michael Chapman wrote: > > > > On Thu, 30 Mar 2023, Lennart Poettering wrote: > > > On Mi, 29.03.23 13:53, Christoph Anton Mitterer (cales...@scientia.org) > > > wrote: > > > > > > > > > That's a bad idea btw. I'd advise you not to do that: on modern > > > > > > systems you want swap, since it makes anonymous memory reclaimable. > > > > > > I > > > > > > am not sure where you are getting this idea from that swap was > > > > > > bad. > > > > > > > > Well I haven't said it's bad, but I guess it depends on the use case > > > > any available RAM. > > > > > > In almost all scenarios you want swap, regardless if little RAM or a > > > lot. For specialist cases where you run everything from memory, and > > > not even programs are backed by disk there might be exceptions. But > > > that#s almost never the case. > > > > One specific case where I deliberately chose _not_ to use swap: large > > hypervisors with local storage. > > > > With swap on the host enabled, all that ended up happening was that local > > IO activity caused idle guest memory to be gradually swapped out. > > Eventually all of the swap space filled up, and the system was exactly > > where it would have been had it not had any swap space configured in the > > first place -- except that it was now _a lot_ slower to migrate those > > swapped-out guests to other hypervisors. > > > > - Michael > > The solution there is to ensure the cgroup configuration for the > slices where the guests run have memory.swap.max=0, rather than > disabling it for the whole system. Perhaps, but given the rest of processes on the system need just a few hundred MB max, and the server has hundreds of GB of RAM, it really makes little difference. Turning off swap altogether is plain _simpler_.
Re: [systemd-devel] how to let systemd hibernate start/stop the swap area?
On Thu, 30 Mar 2023 at 10:15, Michael Chapman wrote: > > On Thu, 30 Mar 2023, Lennart Poettering wrote: > > On Mi, 29.03.23 13:53, Christoph Anton Mitterer (cales...@scientia.org) > > wrote: > > > > > > > That's a bad idea btw. I'd advise you not to do that: on modern > > > > > systems you want swap, since it makes anonymous memory reclaimable. > > > > > I > > > > > am not sure where you are getting this idea from that swap was > > > > > bad. > > > > > > Well I haven't said it's bad, but I guess it depends on the use case > > > any available RAM. > > > > In almost all scenarios you want swap, regardless if little RAM or a > > lot. For specialist cases where you run everything from memory, and > > not even programs are backed by disk there might be exceptions. But > > that#s almost never the case. > > One specific case where I deliberately chose _not_ to use swap: large > hypervisors with local storage. > > With swap on the host enabled, all that ended up happening was that local > IO activity caused idle guest memory to be gradually swapped out. > Eventually all of the swap space filled up, and the system was exactly > where it would have been had it not had any swap space configured in the > first place -- except that it was now _a lot_ slower to migrate those > swapped-out guests to other hypervisors. > > - Michael The solution there is to ensure the cgroup configuration for the slices where the guests run have memory.swap.max=0, rather than disabling it for the whole system.
Re: [systemd-devel] how to let systemd hibernate start/stop the swap area?
On Thu, 30 Mar 2023, Lennart Poettering wrote: > On Mi, 29.03.23 13:53, Christoph Anton Mitterer (cales...@scientia.org) wrote: > > > > > That's a bad idea btw. I'd advise you not to do that: on modern > > > > systems you want swap, since it makes anonymous memory reclaimable. > > > > I > > > > am not sure where you are getting this idea from that swap was > > > > bad. > > > > Well I haven't said it's bad, but I guess it depends on the use case > > any available RAM. > > In almost all scenarios you want swap, regardless if little RAM or a > lot. For specialist cases where you run everything from memory, and > not even programs are backed by disk there might be exceptions. But > that#s almost never the case. One specific case where I deliberately chose _not_ to use swap: large hypervisors with local storage. With swap on the host enabled, all that ended up happening was that local IO activity caused idle guest memory to be gradually swapped out. Eventually all of the swap space filled up, and the system was exactly where it would have been had it not had any swap space configured in the first place -- except that it was now _a lot_ slower to migrate those swapped-out guests to other hypervisors. - Michael
Re: [systemd-devel] how to let systemd hibernate start/stop the swap area?
Hey Lennart. On Wed, 2023-03-29 at 16:35 +0200, Lennart Poettering wrote: > In almost all scenarios you want swap, regardless if little RAM or a > lot. For specialist cases where you run everything from memory, and > not even programs are backed by disk there might be exceptions. Similar to the latter example of yours, one could think of scenarios with little disk space, where it might be interesting to use a swap file for hibernation, but have the storage available when the system is up. > But > that#s almost never the case. IMO that's always hard to say... in WLCG we actually have compute nodes with little to no disk space but plenty of RAM (or well at least a certain amount of GB per core). Though we don't need hibernation there. > It allows the kernel to reclaim anonymous memory, because it can > write > it to disk and then use it for other purposes. Well that's clear, it's just that on my systems (both servers and workstations) I've never really run into the need to reclaim lots of anonymous memory. It was apparently always enough to have cached data reclaimed. > swap is not an "extra > on top", that's a complete misunderstanding how modern memory > management works. By avoiding swap you create artificial > (i.e. unnecessary) scarcity, and disallow the kernel to use RAM for > useful purposes because you block it with anonymous memory that might > never be used. You artificially amplify IO on the file-backed pages > hence, because those become the only ones that are reclaimable. Well, I have been operating like that for many years now, and back then I actually run into trashing more often, so maybe my experiences are just outdated :D If things changed, I'll happily try again to run with swap and see how it turns out for my use cases :-) > https://chrisdown.name/2018/01/02/in-defence-of-swap.html Thanks for the pointer. On Wed, 2023-03-29 at 16:36 +0200, Lennart Poettering wrote: > Yeah, all requests that go through logind check that. > > You can override the check via an environment variable, > SYSTEMD_BYPASS_HIBERNATION_MEMORY_CHECK=1 btw, see > https://systemd.io/ENVIRONMENT/ Nice, that did the trick. Perfect that it's an env var that takes effect at logind, so one can easily add it to the env of just that via the unit file, without the need to fiddle around with bashrc and friends. Since you're anyway rather against the whole idea,... I assume you wouldn't want a PR that adds something like that as an example to the manpages? Or perhaps somehow a hint that logind is consulted first and thus the swap area needs to be already there or the check disabled? Another thing that could be worth to add somewhere is where people are intended to add other service dependencies for things that should be done before/after suspend/resume/etc. ... i.e. the respective .target and not .service. Might make it easier for people to use it properly :-) In any case, thanks for your help :-) Cheers, Chris.
Re: [systemd-devel] how to let systemd hibernate start/stop the swap area?
On Mi, 29.03.23 14:07, Christoph Anton Mitterer (cales...@scientia.org) wrote: > When I use >systemd.log_level=debug systemd.log_target=console > in the kernel parameters, and then do systemctl hibernate during the > system I get: > Mar 29 12:04:48 hbt systemd-logind[780]: Got message type=method_call > sender=:1.9 destination=org.freedesktop.login1 path=/org/freedesktop/login1 > interface=org.freedesktop.login1.Manager member=SetWallMessage cookie=2 > reply_cookie=0 signature=sb error-name=n/a error-message=n/a > Mar 29 12:04:48 hbt systemd-logind[780]: Sent message type=method_return > sender=n/a destination=:1.9 path=n/a interface=n/a member=n/a cookie=61 > reply_cookie=2 signature=n/a error-name=n/a error-message=n/a > Mar 29 12:04:48 hbt systemd-logind[780]: Got message type=method_call > sender=:1.9 destination=org.freedesktop.login1 path=/org/freedesktop/login1 > interface=org.freedesktop.login1.Manager member=HibernateWithFlags cookie=3 > reply_cookie=0 signature=t error-name=n/a error-message=n/a > Mar 29 12:04:48 hbt systemd-logind[780]: Sleep mode "disk" is supported by > the kernel. > Mar 29 12:04:48 hbt systemd-logind[780]: Disk sleep mode "shutdown" is > supported by the kernel. > Mar 29 12:04:48 hbt systemd-logind[780]: No possible swap partitions or files > suitable for hibernation were found in /proc/swaps. > Mar 29 12:04:48 hbt systemd-logind[780]: Sent message type=error sender=n/a > destination=:1.9 path=n/a interface=n/a member=n/a cookie=62 reply_cookie=3 > signature=s error-name=org.freedesktop.login1.SleepVerbNotSupported > error-message=Not enough swap space for hibernation > Mar 29 12:04:48 hbt systemd-logind[780]: Failed to process message > type=method_call sender=:1.9 destination=org.freedesktop.login1 > path=/org/freedesktop/login1 interface=org.freedesktop.login1.Manager > member=HibernateWithFlags cookie=3 reply_cookie=0 signature=t error-name=n/a > error-message=n/a: Not enough swap space for hibernation > > Does that mean it's the same problem as with the desktop environment? > I.e. systemdctl first asking logind whether hibernate was available, > before even starting hibernate.target? Yeah, all requests that go through logind check that. You can override the check via an environment variable, SYSTEMD_BYPASS_HIBERNATION_MEMORY_CHECK=1 btw, see https://systemd.io/ENVIRONMENT/ Lennart -- Lennart Poettering, Berlin
Re: [systemd-devel] how to let systemd hibernate start/stop the swap area?
On Mi, 29.03.23 13:53, Christoph Anton Mitterer (cales...@scientia.org) wrote: > > > That's a bad idea btw. I'd advise you not to do that: on modern > > > systems you want swap, since it makes anonymous memory reclaimable. > > > I > > > am not sure where you are getting this idea from that swap was > > > bad. > > Well I haven't said it's bad, but I guess it depends on the use case > any available RAM. In almost all scenarios you want swap, regardless if little RAM or a lot. For specialist cases where you run everything from memory, and not even programs are backed by disk there might be exceptions. But that#s almost never the case. > If one has plenty of the latter (e.g. our servers at the university > have all at least 64GB or more - and even my laptop has) what else than > giving you a bit more extra on top does swap give you? It allows the kernel to reclaim anonymous memory, because it can write it to disk and then use it for other purposes. swap is not an "extra on top", that's a complete misunderstanding how modern memory management works. By avoiding swap you create artificial (i.e. unnecessary) scarcity, and disallow the kernel to use RAM for useful purposes because you block it with anonymous memory that might never be used. You artificially amplify IO on the file-backed pages hence, because those become the only ones that are reclaimable. > If your memory is limited, like perhaps on IoT, and you rarely (say > once a day when some extra cron jobs run) need more than physical > memory is available, then sure... in that case it's good to have some > relieve of memory pressure. Nope. Not how this works. > But if one generally uses more than one has, I feel it would be better > to run into the oom killer soon. oomd/PSI looks at memory allocation latencies to determine memory pressure. Since you disallow anonymous memory to be paged out and thus increase IO on file backed memory you increase the latencies unnecessarily, thus making oomd trigger earlier. > I've looked a bit around at any recommendations, but I couldn't really > find much credible sources. Read this for example: https://chrisdown.name/2018/01/02/in-defence-of-swap.html It's from 2018, i.e. 5 years ago, but most of it is still accurate. It's from the facebook people, i.e. the folks who maintain cgroups/psi/mm stuff, i.e. who really know these things. If that's not credible enough for you, then I cant't help you. > Uhm... sorry, I don't quite get hat: > systemd-analyze log-level debug > doesn't seem to be a valid command? It is. All systemd-based distros from the last 5y or so should have that command. > But that would also mean that when the swap was enabled manually (and > not via the RequiredBy=systemd-hibernate.service) it wouldn't get > stopped on umount.target? You can add that manually if you like. > So I guess that since you're anyway rather against running without swap > you probably wouldn't accept a feature request that asks for some > method to override that auto-detection (something like > AdvertiseHibernate=(auto|always|never) )? I fail to see the point of the concept these days. Systems where hibernation should be used, should generally also benefit from swap. Lennart -- Lennart Poettering, Berlin
Re: [systemd-devel] how to let systemd hibernate start/stop the swap area?
When I use systemd.log_level=debug systemd.log_target=console in the kernel parameters, and then do systemctl hibernate during the system I get: Mar 29 12:04:48 hbt systemd-logind[780]: Got message type=method_call sender=:1.9 destination=org.freedesktop.login1 path=/org/freedesktop/login1 interface=org.freedesktop.login1.Manager member=SetWallMessage cookie=2 reply_cookie=0 signature=sb error-name=n/a error-message=n/a Mar 29 12:04:48 hbt systemd-logind[780]: Sent message type=method_return sender=n/a destination=:1.9 path=n/a interface=n/a member=n/a cookie=61 reply_cookie=2 signature=n/a error-name=n/a error-message=n/a Mar 29 12:04:48 hbt systemd-logind[780]: Got message type=method_call sender=:1.9 destination=org.freedesktop.login1 path=/org/freedesktop/login1 interface=org.freedesktop.login1.Manager member=HibernateWithFlags cookie=3 reply_cookie=0 signature=t error-name=n/a error-message=n/a Mar 29 12:04:48 hbt systemd-logind[780]: Sleep mode "disk" is supported by the kernel. Mar 29 12:04:48 hbt systemd-logind[780]: Disk sleep mode "shutdown" is supported by the kernel. Mar 29 12:04:48 hbt systemd-logind[780]: No possible swap partitions or files suitable for hibernation were found in /proc/swaps. Mar 29 12:04:48 hbt systemd-logind[780]: Sent message type=error sender=n/a destination=:1.9 path=n/a interface=n/a member=n/a cookie=62 reply_cookie=3 signature=s error-name=org.freedesktop.login1.SleepVerbNotSupported error-message=Not enough swap space for hibernation Mar 29 12:04:48 hbt systemd-logind[780]: Failed to process message type=method_call sender=:1.9 destination=org.freedesktop.login1 path=/org/freedesktop/login1 interface=org.freedesktop.login1.Manager member=HibernateWithFlags cookie=3 reply_cookie=0 signature=t error-name=n/a error-message=n/a: Not enough swap space for hibernation Does that mean it's the same problem as with the desktop environment? I.e. systemdctl first asking logind whether hibernate was available, before even starting hibernate.target? Thanks, Chris.
Re: [systemd-devel] how to let systemd hibernate start/stop the swap area?
Hey. On Wed, 2023-03-29 at 10:20 +0200, Lennart Poettering wrote: > > That's a bad idea btw. I'd advise you not to do that: on modern > > systems you want swap, since it makes anonymous memory reclaimable. > > I > > am not sure where you are getting this idea from that swap was > > bad. Well I haven't said it's bad, but I guess it depends on the use case any available RAM. If one has plenty of the latter (e.g. our servers at the university have all at least 64GB or more - and even my laptop has) what else than giving you a bit more extra on top does swap give you? At the potential cost of the system going into trashing instead, making it more or less completely unresponsive. If your memory is limited, like perhaps on IoT, and you rarely (say once a day when some extra cron jobs run) need more than physical memory is available, then sure... in that case it's good to have some relieve of memory pressure. But if one generally uses more than one has, I feel it would be better to run into the oom killer soon. And even if one's scared about some precious process being killed in a short period of memory pressure... the swap may possibly just shift that to some later time. I've looked a bit around at any recommendations, but I couldn't really find much credible sources. It rather seemed thought that the older some post/blog/etc. is the more likely they recommend swap and the more recent they rather recommend little to none. And even if it was still generally recommended/useful, looking at the comments people give (or there was e.g. that survey [0]) there still simply seems to be quite some fraction of people who wants to run without. The aforementioned patchset also seems like a good indicator that there is some desire to use swap for hibernate only. And even if you argue that people should have swap, there might still be valid use cases for enabling certain swap partitions only for hibernate. E.g. people might want to use https://ddramdisk.store/ (which is however volatile) for swapping, but another swap on persistent storage for hibernate (only). Or one could want to use such a volatile device as "normal" swap without encryption, and an extra one for hibernation with encryption. > > > > That does work, when: > > > > # systemctl start systemd-hibernate.service > > > > but it doesn't when: > > > > # systemctl hibernate > > > > which I don't understand, since I though that would start the > > > > target, > > > > which would pull in and thus start the service, which before > > > > pulls > > > > in > > > > starts my swapfile. > > > > Provide debug logs of PID1, i.e. "systemd-analyze log-level debug" > > right before the hibernation attempt, and then the journal output > > generated that way. Uhm... sorry, I don't quite get hat: systemd-analyze log-level debug doesn't seem to be a valid command? Or do you mean something like: export SYSTEMD_LOG_LEVEL=debug systemd-analyze dump but that doesn't contain anything about "*hibernate*". Neither does it generate any output in journalctl -f or with _SYSTEMD_UNIT=systemd-hibernate.service or _SYSTEMD_UNIT=hibernate.target > > > > Also, I'm not really sure whether the above is the most > > > > systemdic > > > > way... like should I use something like BindsTo= instead of > > > > RequiredBy= > > > > + StopWhenUnneeded=true? > > > > Nah, sounds Ok to me. Thanks. > > > > Also would it be better to explicitly set > > > > DefaultDependencies=no? > > > > Probably, yeah, given that systemd-hibernate.service has that set > > too. But that would also mean that when the swap was enabled manually (and not via the RequiredBy=systemd-hibernate.service) it wouldn't get stopped on umount.target? > > > Yes, logind reports that hibernation is not supported if you have > > no > > swap. Desktops ask logind for that. > > > > Frnakly, the idea that we mount a swap partition only for > > hibernation > > appears to be a bad idea to me. We should drop it from the TODO > > list. If a swap partition is good for hibernation it is also good > > for > > proper swap operation, and not using it for that makes things worth > > in > > almost all ways. So I guess that since you're anyway rather against running without swap you probably wouldn't accept a feature request that asks for some method to override that auto-detection (something like AdvertiseHibernate=(auto|always|never) )? Cheers, Chris. [0] https://opensource.com/article/19/2/swap-space-poll
Re: [systemd-devel] how to let systemd hibernate start/stop the swap area?
On Mi, 29.03.23 04:43, Christoph Anton Mitterer (cales...@scientia.org) wrote: Hi! > I guess many people nowadays will run without any swap for the normal > paging use, but only have it for hibernation (at least on laptops). That's a bad idea btw. I'd advise you not to do that: on modern systems you want swap, since it makes anonymous memory reclaimable. I am not sure where you are getting this idea from that swap was bad. It's *good*. Things like systemd-oomd will even warn if you don't have swap, as it amplifies memory pressure as swap-less systems can only swap out file-backed memory instead of anything anonymous. > That does work, when: ># systemctl start systemd-hibernate.service > but it doesn't when: ># systemctl hibernate > which I don't understand, since I though that would start the target, > which would pull in and thus start the service, which before pulls in > starts my swapfile. Provide debug logs of PID1, i.e. "systemd-analyze log-level debug" right before the hibernation attempt, and then the journal output generated that way. > Also, I'm not really sure whether the above is the most systemdic > way... like should I use something like BindsTo= instead of RequiredBy= > + StopWhenUnneeded=true? Nah, sounds Ok to me. > Also would it be better to explicitly set DefaultDependencies=no? Probably, yeah, given that systemd-hibernate.service has that set too. > 2) The whole thing seems to not work with auto-detection from desktop > environments. > I use Cinnamon, when clicking the shutdown icon, it shows me Suspend, > Restart, Cancel, Shotdown. > > At least the Suspend seems to be somehow detected via systemd, cause > when I set AllowSuspend=false in sleep.conf, it disappears. > > So I guess systemd thinks hibernation isn't possible, because there's > now swap active, and tell that to the GUI. > Guess I'd need some override, for that. Yes, logind reports that hibernation is not supported if you have no swap. Desktops ask logind for that. Frnakly, the idea that we mount a swap partition only for hibernation appears to be a bad idea to me. We should drop it from the TODO list. If a swap partition is good for hibernation it is also good for proper swap operation, and not using it for that makes things worth in almost all ways. Lennart -- Lennart Poettering, Berlin