On 29/01/23(Sun) 14:36, Mark Kettenis wrote:
> > Date: Sun, 29 Jan 2023 12:31:22 +0100
> > From: Martin Pieuchot <[email protected]>
> > 
> > On 23/01/23(Mon) 22:57, David Hill wrote:
> > > On 1/20/23 09:02, Martin Pieuchot wrote:
> > > > > [...] 
> > > > > Ran it 20 times and all completed and passed.  I was also able to 
> > > > > interrupt
> > > > > it as well.   no issues.
> > > > > 
> > > > > Excellent!
> > > > 
> > > > Here's the best fix I could come up with.  We mark the VM map as "busy"
> > > > during the page fault just before the faulting thread releases the 
> > > > shared
> > > > lock.  This ensures no other thread will grab an exclusive lock until 
> > > > the
> > > > fault is finished.
> > > > 
> > > > I couldn't trigger the reproducer with this, can you?
> > > 
> > > Yes, same result as before.  This patch does not seem to help.
> > 
> > Is it the same as before?  I doubt it is.  On a 4-CPU machine I can't
> > trigger the race described in this thread.  On a 8-CPU one I now see all
> > threads sleeping on "thrsleep" except one in "kqread" and one in "wait".
> 
> I'm also seeing bbolt.test processes sleeping on "vmmaplk", "vmmapbsy"
> and "uvn_flsh", just like without the diff :(.  Well, maybe the
> "vmmapbsy" one is new...

"vmmapbsy" is new because vm_map_busy() is now being used.  If you're
seeing this one I need to understand if the faulting thread is being
blocked and where.

Can you enter ddb and get a trace of the threads?  I'm missing some
pieces of informations, so I need fresh debug data.

Thanks to anyone that could get me more information about this.

Reply via email to