On Mon, 20 Jul 2020 15:26:11 +0000
Visa Hankala <[email protected]> wrote:
> On Mon, Jul 20, 2020 at 04:35:12AM +0000, Visa Hankala wrote:
> > On Sun, Jul 19, 2020 at 09:47:54PM +0100, Julian Smith wrote:
> > > I've been finding egdb and gdb rather easily get stuck in an
> > > uninterruptible wait, e.g. when running the 'next' command after
> > > hitting a breakpoint.
[...]
> > The single-thread check done by wait4() is non-interruptible.
> > When the debugger gets stuck, is it blocked in "suspend" state?
ps reports it to be in state 'D'.
> >
> > However, I think there is a bug in the single-thread switch code.
> > It looks that ps_singlecount can be decremented too much. This
> > probably is a regression of making ps_singlecount unsigned and
> > letting single_thread_check() run without the kernel lock.
> >
> > The bug might go away if single_thread_check() made sure that
> > P_SUSPSINGLE is set before the thread suspends.
>
> Below is an updated patch for testing. It extends the scope of
> SCHED_LOCK() so that there are fewer chances of interleaving of
> single_thread_set() and single_thread_check().
Many thanks for these patches. I'll try to test in the next couple of
days. Though the last time i built an OpenBSD kernel was well over a
decade ago, so it might take me a little longer.
Thanks,
- Jules
>
> One problem is that once single_thread_set() sets ps_single, the other
> threads can enter the suspension code in single_thread_check(). The
> extended lock scope prevents the threads from taking action before
> single_thread_set() has finished with the state updates.
>
> Index: kern/kern_sig.c
> ===================================================================
> RCS file: src/sys/kern/kern_sig.c,v
> retrieving revision 1.258
> diff -u -p -r1.258 kern_sig.c
> --- kern/kern_sig.c 15 Jun 2020 13:18:33 -0000 1.258
> +++ kern/kern_sig.c 20 Jul 2020 13:29:50 -0000
> @@ -1915,16 +1915,17 @@ single_thread_check(struct proc *p, int
> return (EINTR);
> }
>
> + SCHED_LOCK(s);
> if (atomic_dec_int_nv(&pr->ps_singlecount)
> == 0) wakeup(&pr->ps_singlecount);
> if (pr->ps_flags & PS_SINGLEEXIT) {
> + SCHED_UNLOCK(s);
> KERNEL_LOCK();
> exit1(p, 0, 0, EXIT_THREAD_NOCHECK);
> - KERNEL_UNLOCK();
> + /* NOTREACHED */
> }
>
> /* not exiting and don't need to unwind, so
> suspend */
> - SCHED_LOCK(s);
> p->p_stat = SSTOP;
> mi_switch();
> SCHED_UNLOCK(s);
> @@ -1950,7 +1951,7 @@ single_thread_set(struct proc *p, enum s
> {
> struct process *pr = p->p_p;
> struct proc *q;
> - int error;
> + int error, s;
>
> KERNEL_ASSERT_LOCKED();
> KASSERT(curproc == p);
> @@ -1974,26 +1975,22 @@ single_thread_set(struct proc *p, enum s
> panic("single_thread_mode = %d", mode);
> #endif
> }
> + SCHED_LOCK(s);
> pr->ps_singlecount = 0;
> membar_producer();
> pr->ps_single = p;
> TAILQ_FOREACH(q, &pr->ps_threads, p_thr_link) {
> - int s;
> -
> if (q == p)
> continue;
> if (q->p_flag & P_WEXIT) {
> if (mode == SINGLE_EXIT) {
> - SCHED_LOCK(s);
> if (q->p_stat == SSTOP) {
> setrunnable(q);
> atomic_inc_int(&pr->ps_singlecount);
> }
> - SCHED_UNLOCK(s);
> }
> continue;
> }
> - SCHED_LOCK(s);
> atomic_setbits_int(&q->p_flag, P_SUSPSINGLE);
> switch (q->p_stat) {
> case SIDL:
> @@ -2027,8 +2024,8 @@ single_thread_set(struct proc *p, enum s
> signotify(q);
> break;
> }
> - SCHED_UNLOCK(s);
> }
> + SCHED_UNLOCK(s);
>
> if (mode != SINGLE_PTRACE)
> single_thread_wait(pr, 1);
>
>
--
http://op59.net