On Wed, Sep 7, 2016 at 11:55 PM, Or Gerlitz via iovisor-dev
<iovisor-...@lists.iovisor.org> wrote:
> On Wed, Sep 7, 2016 at 3:42 PM, Saeed Mahameed <sae...@mellanox.com> wrote:
>> From: Rana Shahout <ra...@mellanox.com>
>>
>> Add support for the BPF_PROG_TYPE_PHYS_DEV hook in mlx5e driver.
>>
>> When XDP is on we make sure to change channels RQs type to
>> MLX5_WQ_TYPE_LINKED_LIST rather than "striding RQ" type to
>> ensure "page per packet".
>>
>> On XDP set, we fail if HW LRO is set and request from user to turn it
>> off.  Since on ConnectX4-LX HW LRO is always on by default, this will be
>> annoying, but we prefer not to enforce LRO off from XDP set function.
>>
>> Full channels reset (close/open) is required only when setting XDP
>> on/off.
>>
>> When XDP set is called just to exchange programs, we will update
>> each RQ xdp program on the fly and for synchronization with current
>> data path RX activity of that RQ, we temporally disable that RQ and
>> ensure RX path is not running, quickly update and re-enable that RQ,
>> for that we do:
>>         - rq.state = disabled
>>         - napi_synnchronize
>>         - xchg(rq->xdp_prg)
>>         - rq.state = enabled
>>         - napi_schedule // Just in case we've missed an IRQ
>>
>> Packet rate performance testing was done with pktgen 64B packets and on
>> TX side and, TC drop action on RX side compared to XDP fast drop.
>>
>> CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
>>
>> Comparison is done between:
>>         1. Baseline, Before this patch with TC drop action
>>         2. This patch with TC drop action
>>         3. This patch with XDP RX fast drop
>>
>> Streams    Baseline(TC drop)    TC drop    XDP fast Drop
>> --------------------------------------------------------------
>> 1           5.51Mpps            5.14Mpps     13.5Mpps
>
> This (13.5 M PPS) is less than 50% of the result we presented @ the
> XDP summit which was obtained by Rana. Please see if/how much does
> this grows if you use more sender threads, but all of them to xmit the
> same stream/flows, so we're on one ring. That (XDP with single RX ring
> getting packets from N remote TX rings) would be your canonical
> base-line for any further numbers.
>

I used N TX senders sending 48Mpps to a single RX core.
The single RX core could handle only 13.5Mpps.

The implementation here is different from the one we presented at the
summit, before, it was with striding RQ, now it is regular linked list
RQ, (Striding RQ ring can handle 32K 64B packets and regular RQ rings
handles only 1K).

In striding RQ we register only 16 HW descriptors for every 32K
packets. I.e for
every 32K packets we access the HW only 16 times.  on the other hand,
regular RQ will access the HW (register descriptors) once per packet,
i.e we write to HW 1K time for 1K packets. i think this explains the
difference.

the catch here is that we can't use striding RQ for XDP, bummer!

As i said, we will have the full and final performance results on V1.
This is just a RFC with barely quick and dirty testing.


> _______________________________________________
> iovisor-dev mailing list
> iovisor-...@lists.iovisor.org
> https://lists.iovisor.org/mailman/listinfo/iovisor-dev

Reply via email to