Re: filt_bpfrdetach uvm_fault after vmd vm was shutdown

Visa Hankala Sun, 20 Oct 2019 03:29:04 -0700

On Sun, Oct 20, 2019 at 02:33:31AM +0200, Alexandr Nedvedicky wrote:
> Hello,
> 
> On Sat, Oct 19, 2019 at 10:30:07AM +0200, Anton Lindqvist wrote:
> > On Wed, Jul 31, 2019 at 06:11:33PM +0100, Stuart Henderson wrote:
> > > July 29 amd64 snap. I had just tested something in a vm (not very
> > > common for me) and did "halt -p" in the guest. Immediately afterwards
> > > I hit this:
> > > 
> > > uvm_fault(0xfffffd84031d4ef0, 0x8, 0, 1) -> e
> > > kernel: page fault trap, code=0
> > > Stopped at      filt_bpfrdetach+0x43:   movq    0x8(%rax),%rax
> > 
> > I've been looking into a similar bpf panic reported by syzkaller[1],
> > which looks somewhat related. The one reported by syzkaller is caused
> > by issuing ioctl(SIOCIFDESTROY) on the interface which the packet filter
> > is attached to. This will in turn invoke the following functions
> > expressed as an inverted stacktrace:
> > 
> >   1. bpfsdetach()
> >   2. vdevgone()
> >   3. VOP_REVOKE()
> >   4. vop_generic_revoke()
> >   5. vgonel()
> >   6. vclean(DOCLOSE)
> >   7. VOP_CLOSE()
> >   8. bpfclose()
> > 
> > Note that bpfclose() is called before changing the vnode type. In
> > bpfclose(), the `struct bpf_d` is immediately removed from the global
> > bpf_d_list list and might end up sleeping inside taskq_barrier(systq).
> > Since the bpf file descriptor (fd) is still present and valid, another
> > thread could perform an ioctl() on the fd only to fault since
> > bpfilter_lookup() will return NULL. The vnode is not locked in this path
> > either so it won't end up waiting on the ongoing vclean().
> > 
> > Even if the `struct bpf_d' was removed from the bpf_d_list after calling
> > taskq_barrier() I assume it would need to be flagged as dying or
> > something similar. In order to prevent other threads from operating on
> > it. After calling bpfilter_lookup(), one would need to check the current
> > state and punt if dying:
> > 
> >   d = bpfilter_lookup(minor);
> >   if (d->bd_state == BPFD_STATE_DYING)
> >     return (ENXIO);
> > 
> > I assume another potential problem would be if the same
> > ioctl(SIOCIFDESTROY) path is executed while another thread is sleeping
> > in bpfioctl(), the `bpfilter_lookup() != NULL' condition would need to
> > checked after sleeping. Alternatively, reference counting would help
> > here but the bd_state would need to be inspected after sleeping.
> > 
> > Thoughts?
> > 
> > [1] 
> > https://syzkaller.appspot.com/bug?id=0fb0ea958b706f72818cc47b182e3ac1dc5d355a
> </snip>
>     Thanks for detailed explanation. I'd like to make sure I got the story 
> right.
>     Does it mean the syzkaller is multithreaded application, which both thread
>     share the same file descriptor? Like thread 1 calls close(bpf_fd),
>     while thread 2 does ioctl(bpf_fd, ...)? If this is true, then the story
>     does make sense. Looks like VFS can protect us if there are two competing
>     processes, but not threads. This is my understanding, not sure if it is
>     correct.
> 
>     Another option would be to let bpfioctl() fail when no bpf descriptor is
>     associated with file.
> --------8<---------------8<---------------8<------------------8<--------
> diff --git a/sys/net/bpf.c b/sys/net/bpf.c
> index 48def8a787e..184e57c1969 100644
> --- a/sys/net/bpf.c
> +++ b/sys/net/bpf.c
> @@ -669,6 +669,9 @@ bpfioctl(dev_t dev, u_long cmd, caddr_t addr, int flag, 
> struct proc *p)
>       int error = 0;
>  
>       d = bpfilter_lookup(minor(dev));
> +     if (d == NULL)
> +             return (ENXIO);
> +
>       if (d->bd_locked && suser(p) != 0) {
>               /* list of allowed ioctls when locked and not root */
>               switch (cmd) {
> --------8<---------------8<---------------8<------------------8<--------
> 
>     We can also backout my commit below [1]:
>        $OpenBSD: bpf.c,v 1.175 2019/05/18 12:59:32 sashan Exp $
>     Diff below puts reference counting for bpf descriptors back. So we will be
>     removing bpf descriptor from the list, when last reference dies.


I would say let's restore the reference counting. The code has proven
to be trickier than expected.

OK visa@ for the backout

Re: filt_bpfrdetach uvm_fault after vmd vm was shutdown

Reply via email to