2017-11-03 3:29 GMT+01:00 Willem de Bruijn <willemdebruijn.ker...@gmail.com>:
>>>> +/*
>>>> + * struct tpacket_memreg_req is used in conjunction with PACKET_MEMREG
>>>> + * to register user memory which should be used to store the packet
>>>> + * data.
>>>> + *
>>>> + * There are some constraints for the memory being registered:
>>>> + * - The memory area has to be memory page size aligned.
>>>> + * - The frame size has to be a power of 2.
>>>> + * - The frame size cannot be smaller than 2048B.
>>>> + * - The frame size cannot be larger than the memory page size.
>>>> + *
>>>> + * Corollary: The number of frames that can be stored is
>>>> + * len / frame_size.
>>>> + *
>>>> + */
>>>> +struct tpacket_memreg_req {
>>>> +       unsigned long   addr;           /* Start of packet data area */
>>>> +       unsigned long   len;            /* Length of packet data area */
>>>> +       unsigned int    frame_size;     /* Frame size */
>>>> +       unsigned int    data_headroom;  /* Frame head room */
>>>> +};
>>> Existing packet sockets take a tpacket_req, allocate memory and let the
>>> user process mmap this. I understand that TPACKET_V4 distinguishes
>>> the descriptor from packet pools, but could both use the existing structs
>>> and logic (packet_mmap)? That would avoid introducing a lot of new code
>>> just for granting user pages to the kernel.
>> We could certainly pass the "tpacket_memreg_req" fields as part of
>> descriptor ring setup ("tpacket_req4"), but we went with having the
>> memory register as a new separate setsockopt. Having it separated,
>> makes it easier to compare regions at the kernel side of things. "Is
>> this the same umem as another one?" If we go the path of passing the
>> range at descriptor ring setup, we need to handle all kind of
>> overlapping ranges to determine when a copy is needed or not, in those
>> cases where the packet buffer (i.e. umem) is shared between processes.
> That's not what I meant. Both descriptor rings and packet pools are
> memory regions. Packet sockets already have logic to allocate regions
> and make them available to userspace with mmap(). Packet v4 reuses
> that logic for its descriptor rings. Can it use the same for its packet
> pool? Why does the kernel map user memory, instead? That is a lot of
> non-trivial new logic.

Ah, got it. So, why do we register packet pool memory, instead of
allocating in the kernel and mapping *that* memory.

Actually, we started out with that approach, where the packet_mmap
call mapped Tx/Rx descriptor rings and the packet buffer region. We
later moved to this (register umem) approach, because it's more
flexible for user space, not having to use a AF_PACKET specific
allocator (i.e. continue to use regular mallocs, huge pages and such).

I agree that the memory register code is adding a lot of new logic,
but I believe it's worth the flexibility for user space. I'm looking
into if I can share the memory register logic from Infiniband/verbs
subsystem (drivers/infiniband/core/umem.c).


Reply via email to