On Wed, Feb 7, 2018 at 4:28 PM, Björn Töpel <bjorn.to...@gmail.com> wrote: > 2018-02-07 16:54 GMT+01:00 Willem de Bruijn <willemdebruijn.ker...@gmail.com>: >>> We realized, a bit late maybe, that 24 patches is a bit mouthful, so >>> let me try to make it more palatable. >> >> Overall, this approach looks great to me. >> > > Yay! :-) > >> The patch set incorporates all the feedback from AF_PACKET V4. >> At this point I don't have additional high-level interface comments. >> > > I have a thought on the socket API. Now, we're registering buffer > memory *to* the kernel, but mmap:ing the Rx/Tx rings *from* the > kernel. I'm leaning towards removing the mmap call, in favor of > registering the rings to kernel analogous to the XDP_MEM_REG socket > option. We wont guarantee physical contiguous memory for the rings, > but I think we can live with that. Thoughts? > >> As you point out, 24 patches and nearly 6000 changed lines is >> quite a bit to ingest. Splitting up in smaller patch sets will help >> give more detailed implementation feedback. >> >> The frame pool and device driver changes are largely independent >> from AF_XDP and probably should be resolved first (esp. the >> observed regresssion even without AF_XDP). >> > > Yeah, the regression is unacceptable. > > Another way is starting with the patches without zero-copy first > (i.e. the copy path), and later add the driver modifications. That > would be the first 7 patches. > >> As you suggest, it would be great if the need for a separate >> xsk_packet_array data structure can be avoided. >> > > Yes, we'll address that! > >> Since frames from the same frame pool can be forwarded between >> multiple device ports and thus AF_XDP sockets, that should perhaps >> be a separate object independent from the sockets. This comment >> hints at the awkward situation if tied to a descriptor pair: >> >>> + /* Check if umem is from this socket, if so do not make >>> + * circular references. >>> + */ >> >> Since this is in principle just a large shared memory area, could >> it reuse existing BPF map logic? >> > > Hmm, care to elaborate on your thinking here?
On second thought, that is not workable. I was thinking of reusing existing mmap support for maps, but that is limited to the perf ring buffer. >> More extreme, and perhaps unrealistic, is if the descriptor ring >> could similarly be a BPF map and the Rx XDP program directly >> writes the descriptor, instead of triggering xdp_do_xsk_redirect. >> As we discussed before, this would avoid the need to specify a >> descriptor format upfront. > > Having the XDP program writeback the descriptor to user space ring is > really something that would be useful (writing a virtio-net > descriptors...). Yes, that's a great use case. This ties in with Jason Wang's presentation on XDP with tap and virtio, too. https://www.netdevconf.org/2.2/slides/wang-vmperformance-talk.pdf > I need to think a bit more about this. :-) Please > share your ideas! > > Thanks for looking into the patches! > > > Björn