On Fri, May 14, 2021 at 02:58:30PM -0900, Matthew Dillon wrote: > SWAPMETA is used for configuring and paging to swap space, so there is no > boot dependency.... that is, the kernel can bootstrap completely without > ever having to use swap space. However, once the system is running, > paging to swap typically occurs due to low-memory situations (if you think > about it, that's why the system decides it needs to page to swap in the > first place). > > Since one of the few ways the system has to free memory is to page data to > swap, it can wind up in a no-win situation if the page-out code winds up > having to allocate kernel memory in order to manage the swapped pages and > blocks on said allocations. The key here is that the system has to be able > to 'make progress' freeing memory, so as long as a few pages are available > in the emergency free page reserve, and as long as those pages are ONLY > needed to back actual objects (and no additional pages are needed e.g. for > PV entries or MAP entries or page-table-page infrastructure to support the > new mapping), then the system can make progress. > > This is what zalloc is able to guarantee that none of the other memory > subsystems in the kernel are able to guarantee. zalloc() is able to > guarantee that even a single page allocation will be sufficient to make > progress on a stuck zalloc request. Since only one is needed, the > emergency page reserve is sufficient for that. And then the paging code is > able to free up a page soon after so the emergency page reserve doesn't > become exhausted. > > -- > > I believe that the kmalloc_obj subsystem (which is brand-new) can be > adjusted to make these guarantees, primarily by ensuring that extra slabs > are allocated ahead-of-time whenever possible. And for anything which is > boot-time sensitive, the extra slabs could be installed at early boot prior > to first use (kinda like how the zalloc system is initialized, except these > slabs are 128KB each. Still, no reason why a few couldn't be declared > statically as BSS in the kernel image). The code in question would be the > _kmalloc_obj() path starting line 664 of kern_kmalloc.c, and its related > slab allocation which occurs at line 821. Basically, some code would have > to be added to attempt to maintain N (e.g. like 3) slabs on the per-zone > ggm->empty list on any normal allocation that eats a slab out of that list, > using non-blocking kmem_slab_alloc() calls, and falling back to blocking > kmem_slab_alloc()'s if the list winds up being empty anyway. > > Probably a bit of work in kmem_slab_alloc() would also be needed to support > M_INTNOWAIT in the slab-maintain code to allow the reserve to be used. But > M_WAITOK would still have to be used if slabs wind up being exhausted > anyway. Something like that. There are also additional possible > low-memory deadlock points involved in terms of the fact that > kmem_slab_alloc() dynamically allocates KVA space as secondary factors, but > we would have to do a lot of low-memory / paging testing to determine what > deadlocks might still exist. > > Right now the system has basically solved the low-memory deadlock issues so > we don't want to reintroduce any. > > -- > > At the moment nobody is planning on doing this work so if you would like to > continue to review it, please do! > > -Matt
Thanks, this is really helpful. I have a lot of code-reading to do, to make sure I understand what is happening. I don't know if I'll produce anything useful, but I'm certainly having fun reading. -- James