Re: filt_bpfrdetach uvm_fault after vmd vm was shutdown

Martin Pieuchot Mon, 21 Oct 2019 02:36:00 -0700

On 19/10/19(Sat) 10:30, Anton Lindqvist wrote:
> On Wed, Jul 31, 2019 at 06:11:33PM +0100, Stuart Henderson wrote:
> > July 29 amd64 snap. I had just tested something in a vm (not very
> > common for me) and did "halt -p" in the guest. Immediately afterwards
> > I hit this:
> > 
> > uvm_fault(0xfffffd84031d4ef0, 0x8, 0, 1) -> e
> > kernel: page fault trap, code=0
> > Stopped at      filt_bpfrdetach+0x43:   movq    0x8(%rax),%rax
> 
> I've been looking into a similar bpf panic reported by syzkaller[1],
> which looks somewhat related. The one reported by syzkaller is caused
> by issuing ioctl(SIOCIFDESTROY) on the interface which the packet filter
> is attached to. This will in turn invoke the following functions
> expressed as an inverted stacktrace:
> 
>   1. bpfsdetach()
>   2. vdevgone()
>   3. VOP_REVOKE()
>   4. vop_generic_revoke()
>   5. vgonel()
>   6. vclean(DOCLOSE)
>   7. VOP_CLOSE()
>   8. bpfclose()
> 
> Note that bpfclose() is called before changing the vnode type. In
> bpfclose(), the `struct bpf_d` is immediately removed from the global
> bpf_d_list list and might end up sleeping inside taskq_barrier(systq).
> Since the bpf file descriptor (fd) is still present and valid, another
> thread could perform an ioctl() on the fd only to fault since
> bpfilter_lookup() will return NULL.


Are you saying that the assumption that a descriptor will be found and
valid as long as a vnode has been open is not true?  In other words the
so called "USB race" as described in bpfpoll() exists because of sleeping
points during detach.  The introduction of taskq_barrier() introduced a
new sleeping point, hence more changes to trigger this race.

>                                     The vnode is not locked in this path
> either so it won't end up waiting on the ongoing vclean().

That leads to an interesting question: should we serialize device access
at the vnode level or should we generalize the existing "solution" and
assume that the descriptor can be NULL/invalid at the "pseudo-driver"
level?

This is a common problem for all our device drivers, so a generic solution
might save a lot of per-driver fixes.
 
> Even if the `struct bpf_d' was removed from the bpf_d_list after calling
> taskq_barrier() I assume it would need to be flagged as dying or
> something similar. In order to prevent other threads from operating on
> it. After calling bpfilter_lookup(), one would need to check the current
> state and punt if dying:
> 
>   d = bpfilter_lookup(minor);
>   if (d->bd_state == BPFD_STATE_DYING)
>     return (ENXIO);

That's another flavor of per-driver fix.

> I assume another potential problem would be if the same
> ioctl(SIOCIFDESTROY) path is executed while another thread is sleeping
> in bpfioctl(), the `bpfilter_lookup() != NULL' condition would need to
> checked after sleeping. Alternatively, reference counting would help
> here but the bd_state would need to be inspected after sleeping.

Indeed, we see this problem when detaching network pseudo-interfaces as
the `ifp' might become invalid after every sleeping point in the ioctl
path of the network stack.

Re: filt_bpfrdetach uvm_fault after vmd vm was shutdown

Reply via email to