On Tue, 21 Jul 2020 19:23:44 +0100
Julian Smith <[email protected]> wrote:

> On Mon, 20 Jul 2020 17:18:19 +0100
> Julian Smith <[email protected]> wrote:
> 
> > On Mon, 20 Jul 2020 15:26:11 +0000
> > Visa Hankala <[email protected]> wrote:
> >   
> > > On Mon, Jul 20, 2020 at 04:35:12AM +0000, Visa Hankala wrote:    
> > > > On Sun, Jul 19, 2020 at 09:47:54PM +0100, Julian Smith wrote:
> > > >    
> > > > > I've been finding egdb and gdb rather easily get stuck in an
> > > > > uninterruptible wait, e.g. when running the 'next' command
> > > > > after hitting a breakpoint.    
> > 
> > [...]
> >   
> > > > The single-thread check done by wait4() is non-interruptible.
> > > > When the debugger gets stuck, is it blocked in "suspend" state?
> > > >    
> > 
> > ps reports it to be in state 'D'.
> >   
> > > > 
> > > > However, I think there is a bug in the single-thread switch
> > > > code. It looks that ps_singlecount can be decremented too much.
> > > > This probably is a regression of making ps_singlecount unsigned
> > > > and letting single_thread_check() run without the kernel lock.
> > > > 
> > > > The bug might go away if single_thread_check() made sure that
> > > > P_SUSPSINGLE is set before the thread suspends.       
> > > 
> > > Below is an updated patch for testing. It extends the scope of
> > > SCHED_LOCK() so that there are fewer chances of interleaving of
> > > single_thread_set() and single_thread_check().    
> > 
> > Many thanks for these patches. I'll try to test in the next couple
> > of days. Though the last time i built an OpenBSD kernel was well
> > over a decade ago, so it might take me a little longer.  
> 
> I managed to build a patched kernel, and it seems to fix the problem -
> i haven't been able to get egdb into an uninterruptable wait state.
> 
> Also, i've been running the patched kernel all day now and it doesn't
> seem to be causing any problems elsewhere.

Unfortunately the same problem has just occurred again. I've run egdb
quite a few times since i updated the kernel, so the patch has
definitely reduced the problem, but it doesn't seem to have eliminated
it.

Let me know if there anything i could do to find out more information.

Thanks,

- Jules

-- 
http://op59.net


Reply via email to