On 29/01/23(Sun) 14:36, Mark Kettenis wrote: > > Date: Sun, 29 Jan 2023 12:31:22 +0100 > > From: Martin Pieuchot <[email protected]> > > > > On 23/01/23(Mon) 22:57, David Hill wrote: > > > On 1/20/23 09:02, Martin Pieuchot wrote: > > > > > [...] > > > > > Ran it 20 times and all completed and passed. I was also able to > > > > > interrupt > > > > > it as well. no issues. > > > > > > > > > > Excellent! > > > > > > > > Here's the best fix I could come up with. We mark the VM map as "busy" > > > > during the page fault just before the faulting thread releases the > > > > shared > > > > lock. This ensures no other thread will grab an exclusive lock until > > > > the > > > > fault is finished. > > > > > > > > I couldn't trigger the reproducer with this, can you? > > > > > > Yes, same result as before. This patch does not seem to help. > > > > Is it the same as before? I doubt it is. On a 4-CPU machine I can't > > trigger the race described in this thread. On a 8-CPU one I now see all > > threads sleeping on "thrsleep" except one in "kqread" and one in "wait". > > I'm also seeing bbolt.test processes sleeping on "vmmaplk", "vmmapbsy" > and "uvn_flsh", just like without the diff :(. Well, maybe the > "vmmapbsy" one is new...
"vmmapbsy" is new because vm_map_busy() is now being used. If you're seeing this one I need to understand if the faulting thread is being blocked and where. Can you enter ddb and get a trace of the threads? I'm missing some pieces of informations, so I need fresh debug data. Thanks to anyone that could get me more information about this.
