On 07/03/2026 13:01, Michael Kelly wrote:
On 07/03/2026 11:21, Samuel Thibault wrote:
Locked? Then that's the issue. Having a lock while waiting for memory
would indeed be a sure path to deadlock. pmap_enter notably releases its
PVH and pmap precisely to avoid such deadlocks.
I however didn't see where vm_fault/pmap_enter locks the map?
I made a mistake with that conclusion as I can't see how it could have
the map locked now either. It does seem like some thread has
possession of the lock though as there are 9 threads awaiting a write
lock and 1 for a read. I can't prove that they are all waiting for
the same lock, although it seems likely, and I should have tried to
find that out at the time. Similarly it might have also helped to
record the bit flags associated with the lock.
Anyway, I can run it again to try and supply more detail.
I made more than one mistake with the initial analysis. It is so far off
I wonder if I was even examining the correct virtual machine. Please
ignore everything that has gone before; I must have been hallucinating
too. Here is actually what is happening.
During sbuilds of haskell packages there are dependent packages
installed that have a large installed size (ghc-doc for example is
~700M). Often during the write of this data, the system seems to enter a
blocked state. Normal page allocation is suspended and so non-vm
privileged tasks, including ext2fs servers, soon get blocked if they
require more memory. Any process accessing file storage is also likely
to block on pagein from the stalled servers so even the console becomes
unresponsive.
The system is not actually totally stuck. Pageout processing continues
at a low level. There is no default pager running so only external pages
can considered for pageout. Appropriate memory_object_data_return
requests are issued to external pagers at the rate of approximately 100
per second. The CPU load is so low that the virtual machine 'CPU usage'
graph superficially looks like it is zero. None of these m_o_d_r
messages can be handled and actually free pages steadily decline.
I added some debugging to log every 100th pageout attempt from when
vm_page_alloc_paused becomes set. In one example, free pages steadily
drop from ~67500 to about ~32000 over a period of ~22minutes. Then
suddenly the pageout processing comes across a large series of pages
(~38000) that can be trivially reclaimed which are sufficient to
terminate the pageout activity and resume normal page allocation. The
system becomes usable again.
Might it be that boralus is also behaving this way without it being
noticed? The use of sync=5 might reduce the likelihood of this
occurring, I'd guess, but I have also seen this scenario occur using
sync=5 myself.
The fundamental problem is that ext2fs, being unprivileged, cannot
allocate memory in order to allow other memory to be released. This is
well known, I believe, but we need to do something to reduce the
likelihood of this scenario as there could be cases that would result in
the system not recovering. For example, if internal memory usage was
dominant and a large write quickly used the remaining pages (before
unprivileged allocation is suspended) and before sync could process the
written pages, there might be too few pages available to page out at all.
Regards,
Mike.