On 29/01/18(Mon) 21:25, Artturi Alm wrote:
> On Mon, Jan 29, 2018 at 08:03:38PM +0100, Martin Pieuchot wrote:
> > On 29/01/18(Mon) 20:38, Artturi Alm wrote:
> > > On Mon, Jan 29, 2018 at 10:42:20AM +0100, Martin Pieuchot wrote:
> > > > Hello Artturi,
> > > >
> > > > On 28/01/18(Sun) 09:08, Artturi Alm wrote:
> > > > > >Synopsis: stuck in netlock
> > > > > >Category: amd64
> > > > > >Environment:
> > > > > System : OpenBSD 6.2
> > > > > Details : OpenBSD 6.2-current (GENERIC.MP) #333: Sun Jan 7
> > > > > 09:13:00 MST 2018
> > > > >
> > > > > [email protected]:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> > > > >
> > > > > Architecture: OpenBSD.amd64
> > > > > Machine : amd64
> > > > > >Description:
> > > > > processes getting stuck w/STATE=netlock, kill has no effect.
> > > > > >How-To-Repeat:
> > > > > using the desktop normally, until trying to restart chrome ends
> > > > > up failing.
> > > >
> > > > What do you mean with "using the desktop normally"? Which applications
> > > > are you using? Which browser plugins? Can you find out the minimum
> > > > setup to reproduce this deadlock?
> > > >
> > > > > I've had this happen to me atleast twice in the last few of
> > > > > weeks.
> > > >
> > > > Do you know how to reproduce it easily?
> > > >
> > >
> > > this time i had less than 10tabs open, so i guess it can be narrowed
> > > down even further.
> > >
> > > > > At first time i noticed how trying to launch chrome did lock up
> > > > > all the other processes in netlock, and "pkill chrome" did allow
> > > > > the system to recover, i was unable to figure out what was wrong
> > > > > and rebooting did make everything work again, while ie.
> > > > > removing ~/.cache & ~/.config did not.
> > > >
> > > > So the deadlock is related to your chrome usage?
> > > >
> > >
> > > now it does feel like so. i'll upgrade tonight.
> > >
> > > > > long before running the "ps cl" below, i had already killed all
> > > > > the xterm-windows those processes were in. cwm(1) was unable to
> > > > > kill some of those, but xkill did not.
> > > >
> > > > Well killing process waiting for the 'netlock' won't help. What has to
> > > > be find is which process is holding it. For that we need the full ps
> > > > output, including kernel and userland threads.
> > > > >
> > > > > after exiting X w/ctrl+alt+backspace(iirc?) i didn't get back to
> > > > > $-prompt, and ^T did show xauth stuck in netlock..
> > > > > i guess it's obvious where it was heading; so i got pics of
> > > > > "# reboot -nq" failing because stuck in the fckng netlock -_-
> > > > >
> > > > > i do have ddb.{panic,console,log}=1, but
> > > > > "# sysctl ddb.trigger=1" ==
> > > > > "sysctl: ddb.trigger: Operation not supported by device"
> > > >
> > > > Not having DDB access will limit the debugging experience. Are you sure
> > > > you tried to enter it on your console?
> > > >
> > >
> > > so this requires ttyC0, right?
> > > this time it was ifconfig in [netlock], that prevented using ttyC0.
> > > i got there from X by running "virsh shutdown <domain" from the kvm host,
> > > i guess it emulates what pressing actual power button would(acpi?).
> > >
> > > > > ?? so i had no option but "virsh reset <domain>"...
> > > >
> > > > Did you try top(1)? What were the kernel processes doing?
> > >
> > > see below, if "top -bCHS -d 1 999" should do.
> > > anything else i could do? anyway, thanks in advance:)
> >
> > This is where the problems comes from:
> >
> > > 33315 443734 -6 0 141M 102M idle viowait 0:00 0.00%
> > > chrome:
> >
> > I don't understand how chrome can end up sleeping in vio_ioctl() and why
> > it is sleeping forever. But this thread is holding the NET_LOCK() and
> > prevents the rest of the kernel from making progress.
> >
> > Could you try a virtual interface different from vio(4) and see if you
> > can reproduce the problem?
>
> Will try with 'e1000', but then this does seem to me like it would have
> something to do with routing too(?), as the vio0 is only for reaching to
> the host.
> and separate physical interface, to which the default route belongs to.
Here's a diff to fix vio(4), could you give it a go?
Index: dev/pv/if_vio.c
===================================================================
RCS file: /cvs/src/sys/dev/pv/if_vio.c,v
retrieving revision 1.4
diff -u -p -r1.4 if_vio.c
--- dev/pv/if_vio.c 10 Aug 2017 18:03:51 -0000 1.4
+++ dev/pv/if_vio.c 23 Feb 2018 09:14:29 -0000
@@ -1276,7 +1276,8 @@ vio_wait_ctrl(struct vio_softc *sc)
int r = 0;
while (sc->sc_ctrl_inuse != FREE) {
- r = tsleep(&sc->sc_ctrl_inuse, PRIBIO|PCATCH, "viowait", 0);
+ r = rwsleep(&sc->sc_ctrl_inuse, &netlock, PRIBIO|PCATCH,
+ "viowait", 0);
if (r == EINTR)
return r;
}
@@ -1295,7 +1296,8 @@ vio_wait_ctrl_done(struct vio_softc *sc)
r = 1;
break;
}
- r = tsleep(&sc->sc_ctrl_inuse, PRIBIO|PCATCH, "viodone", 0);
+ r = rwsleep(&sc->sc_ctrl_inuse, &netlock, PRIBIO|PCATCH,
+ "viodone", 0);
if (r == EINTR)
break;
}