On Sun, Oct 20, 2019 at 02:33:31AM +0200, Alexandr Nedvedicky wrote: > Hello, > > On Sat, Oct 19, 2019 at 10:30:07AM +0200, Anton Lindqvist wrote: > > On Wed, Jul 31, 2019 at 06:11:33PM +0100, Stuart Henderson wrote: > > > July 29 amd64 snap. I had just tested something in a vm (not very > > > common for me) and did "halt -p" in the guest. Immediately afterwards > > > I hit this: > > > > > > uvm_fault(0xfffffd84031d4ef0, 0x8, 0, 1) -> e > > > kernel: page fault trap, code=0 > > > Stopped at filt_bpfrdetach+0x43: movq 0x8(%rax),%rax > > > > I've been looking into a similar bpf panic reported by syzkaller[1], > > which looks somewhat related. The one reported by syzkaller is caused > > by issuing ioctl(SIOCIFDESTROY) on the interface which the packet filter > > is attached to. This will in turn invoke the following functions > > expressed as an inverted stacktrace: > > > > 1. bpfsdetach() > > 2. vdevgone() > > 3. VOP_REVOKE() > > 4. vop_generic_revoke() > > 5. vgonel() > > 6. vclean(DOCLOSE) > > 7. VOP_CLOSE() > > 8. bpfclose() > > > > Note that bpfclose() is called before changing the vnode type. In > > bpfclose(), the `struct bpf_d` is immediately removed from the global > > bpf_d_list list and might end up sleeping inside taskq_barrier(systq). > > Since the bpf file descriptor (fd) is still present and valid, another > > thread could perform an ioctl() on the fd only to fault since > > bpfilter_lookup() will return NULL. The vnode is not locked in this path > > either so it won't end up waiting on the ongoing vclean(). > > > > Even if the `struct bpf_d' was removed from the bpf_d_list after calling > > taskq_barrier() I assume it would need to be flagged as dying or > > something similar. In order to prevent other threads from operating on > > it. After calling bpfilter_lookup(), one would need to check the current > > state and punt if dying: > > > > d = bpfilter_lookup(minor); > > if (d->bd_state == BPFD_STATE_DYING) > > return (ENXIO); > > > > I assume another potential problem would be if the same > > ioctl(SIOCIFDESTROY) path is executed while another thread is sleeping > > in bpfioctl(), the `bpfilter_lookup() != NULL' condition would need to > > checked after sleeping. Alternatively, reference counting would help > > here but the bd_state would need to be inspected after sleeping. > > > > Thoughts? > > > > [1] > > https://syzkaller.appspot.com/bug?id=0fb0ea958b706f72818cc47b182e3ac1dc5d355a > </snip> > Thanks for detailed explanation. I'd like to make sure I got the story > right. > Does it mean the syzkaller is multithreaded application, which both thread > share the same file descriptor? Like thread 1 calls close(bpf_fd), > while thread 2 does ioctl(bpf_fd, ...)? If this is true, then the story > does make sense. Looks like VFS can protect us if there are two competing > processes, but not threads. This is my understanding, not sure if it is > correct. > > Another option would be to let bpfioctl() fail when no bpf descriptor is > associated with file. > --------8<---------------8<---------------8<------------------8<-------- > diff --git a/sys/net/bpf.c b/sys/net/bpf.c > index 48def8a787e..184e57c1969 100644 > --- a/sys/net/bpf.c > +++ b/sys/net/bpf.c > @@ -669,6 +669,9 @@ bpfioctl(dev_t dev, u_long cmd, caddr_t addr, int flag, > struct proc *p) > int error = 0; > > d = bpfilter_lookup(minor(dev)); > + if (d == NULL) > + return (ENXIO); > + > if (d->bd_locked && suser(p) != 0) { > /* list of allowed ioctls when locked and not root */ > switch (cmd) { > --------8<---------------8<---------------8<------------------8<-------- > > We can also backout my commit below [1]: > $OpenBSD: bpf.c,v 1.175 2019/05/18 12:59:32 sashan Exp $ > Diff below puts reference counting for bpf descriptors back. So we will be > removing bpf descriptor from the list, when last reference dies.
I would say let's restore the reference counting. The code has proven to be trickier than expected. OK visa@ for the backout
