On 19/10/19(Sat) 10:30, Anton Lindqvist wrote: > On Wed, Jul 31, 2019 at 06:11:33PM +0100, Stuart Henderson wrote: > > July 29 amd64 snap. I had just tested something in a vm (not very > > common for me) and did "halt -p" in the guest. Immediately afterwards > > I hit this: > > > > uvm_fault(0xfffffd84031d4ef0, 0x8, 0, 1) -> e > > kernel: page fault trap, code=0 > > Stopped at filt_bpfrdetach+0x43: movq 0x8(%rax),%rax > > I've been looking into a similar bpf panic reported by syzkaller[1], > which looks somewhat related. The one reported by syzkaller is caused > by issuing ioctl(SIOCIFDESTROY) on the interface which the packet filter > is attached to. This will in turn invoke the following functions > expressed as an inverted stacktrace: > > 1. bpfsdetach() > 2. vdevgone() > 3. VOP_REVOKE() > 4. vop_generic_revoke() > 5. vgonel() > 6. vclean(DOCLOSE) > 7. VOP_CLOSE() > 8. bpfclose() > > Note that bpfclose() is called before changing the vnode type. In > bpfclose(), the `struct bpf_d` is immediately removed from the global > bpf_d_list list and might end up sleeping inside taskq_barrier(systq). > Since the bpf file descriptor (fd) is still present and valid, another > thread could perform an ioctl() on the fd only to fault since > bpfilter_lookup() will return NULL.
Are you saying that the assumption that a descriptor will be found and valid as long as a vnode has been open is not true? In other words the so called "USB race" as described in bpfpoll() exists because of sleeping points during detach. The introduction of taskq_barrier() introduced a new sleeping point, hence more changes to trigger this race. > The vnode is not locked in this path > either so it won't end up waiting on the ongoing vclean(). That leads to an interesting question: should we serialize device access at the vnode level or should we generalize the existing "solution" and assume that the descriptor can be NULL/invalid at the "pseudo-driver" level? This is a common problem for all our device drivers, so a generic solution might save a lot of per-driver fixes. > Even if the `struct bpf_d' was removed from the bpf_d_list after calling > taskq_barrier() I assume it would need to be flagged as dying or > something similar. In order to prevent other threads from operating on > it. After calling bpfilter_lookup(), one would need to check the current > state and punt if dying: > > d = bpfilter_lookup(minor); > if (d->bd_state == BPFD_STATE_DYING) > return (ENXIO); That's another flavor of per-driver fix. > I assume another potential problem would be if the same > ioctl(SIOCIFDESTROY) path is executed while another thread is sleeping > in bpfioctl(), the `bpfilter_lookup() != NULL' condition would need to > checked after sleeping. Alternatively, reference counting would help > here but the bd_state would need to be inspected after sleeping. Indeed, we see this problem when detaching network pseudo-interfaces as the `ifp' might become invalid after every sleeping point in the ioctl path of the network stack.
