On 08/04/16 17:24, Ben RUBSON wrote:

On 04 Aug 2016, at 11:40, Ben RUBSON <ben.rub...@gmail.com> wrote:

On 02 Aug 2016, at 22:11, Ben RUBSON <ben.rub...@gmail.com> wrote:

On 02 Aug 2016, at 21:35, Hans Petter Selasky <h...@selasky.org> wrote:

The CX-3 driver doesn't bind the worker threads to specific CPU cores by default, so if 
your CPU has more than one so-called numa, you'll end up that the bottle-neck is the 
high-speed link between the CPU cores and not the card. A quick and dirty workaround is 
to "cpuset" iperf and the interrupt and taskqueue threads to specific CPU cores.

My CPUs : 2x E5-2620v3 with DDR4@1866.

OK, so I cpuset all Mellanox interrupts to one NUMA, as well as the iPerf 
processes, and I'm able to reach max bandwidth.
Choosing the wrong NUMA (or both, or one for interrupts, the other one for 
iPerf, etc...) totally kills throughput.

However, full-duplex throughput is still limited, I can't manage to reach 
2x40Gb/s, throttle is at about 45Gb/s.
I tried many different cpuset layouts, but I never went above 45Gb/s.
(Linux allowed me to reach 2x40Gb/s so hardware is not a bottleneck)

Are you using "options RSS" and "options PCBGROUP" in your kernel config?

I will then give RSS a try.

Without RSS :
A ---> B : 40Gbps (unidirectional)
A <--> B : 45Gbps (bidirectional)

With RSS :
A ---> B : 28Gbps (unidirectional)
A <--> B : 28Gbps (bidirectional)

Sounds like RSS does not help :/

Why, without RSS, do I have difficulties to reach 2x40Gbps (full-duplex) ?


Hi,

Possibly because the packets are arriving at the wrong CPU compared to what RSS expects. Then RSS will invoke a taskqueue to process the packets on the correct CPU, if I'm not mistaken.

The mlx4 driver does not fully support RSS. Then mlx5 does.

--HPS
_______________________________________________
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Reply via email to