On 18/03/2026 19:57, Michael Kelly wrote:
On 18/03/2026 09:42, Michael Kelly wrote:
mypy package is certainly a good example. It crashed my VM yesterday running out of swap space although admittedly I have only 4GB of that available. I'm going to run this same build on a Debian/Linux similarly sized VM to see what memory resources are used on that OS for comparison.

I did succeed in getting the Hurd mypy build to lock up. This is perhaps what Samuel is experiencing on the buildd? In my case, several of the mach-defpager threads are stuck and the build stops. See kernel debugger output appended. I think that thread 4 is blocked because the page is 'busy' but why it's busy, given that it's a real (not fictitious) page, I don't know yet. Why is $map21 locked? I have this state preserved in a VM snapshot but I've run out of time today to look further.

I have a theory as to what is happening which will hopefully withstand analysis by those more knowledgeable than I am.

The kernel debugger confirms that the map is locked by thread4 and has a read count of 1. The first member of 'struct vm_map' is the lock itself, lock->thread has offset 0 and lock->read_count has offset 8, so:

db> x /x $map21,4
        df9c76c0    ffffffff    e0001        0

db> print $task21.4
ffffffffdf9c76c0

This shows that map->lock has been converted to a recursive lock within thread4 whose stack trace is:

[...]
thread_block(...)+0x5d
vm_fault_page(...)+0x121b
vm_fault(...)+0x4e0
vm_fault_wire(...)+0x75
vm_map_pageable_scan(...)+0x154
vm_map_pageable(...)+0x141
vm_map_copyout(...)+0x457
ipc_kmsg_copyout_body(...)+0x70
ipc_kmsg_copyout(...)+0x51
mach_msg_continue(...)+0x9c

$map21 is initially write locked by vm_map_find_entry_anywhere() within vm_map_copyout(). The final part of the copyout is to wire pages that require it by a call to vm_map_pageable() which calls vm_map_pageable_scan() to do that. This is where the write lock is downgraded to a read lock which converts it to a recursive lock.

Each map entry requiring wiring gets a call to vm_fault_wire(). One of the pages for this map entry must fail the call to vm_wire_fast() and so ends up in vm_fault(). Normally when vm_fault is called, the map is not locked and vm_map_lookup() locks and unlocks the map to ensure that any thread blocks that result do not do so with the map locked. In this instance however, the map is already read locked and the map lookup would inc/dec the read count back to a read lock before the remainder of the fault handling takes place. This is how the map lock remains held within the thread_block() which is bad news.

As to why the page is blocked on the busy state, I don't know. It's possible that is normal behaviour and only shows because of the map locking issue.

The recursive lock is also set in vm_fault_unwire() so it seems that this strategy is intentional but unwiring is perhaps less hazardous than wiring since the page is guaranteed to be available.

If this analysis is agreed, then it seems to me that it will be necessary to rearrange the code to call vm_fault(), for those virtual addresses that cannot be wired fast, without the map lock held.

Cheers,

Mike.


Reply via email to