Hello, I'm doing some network performance testing and looking for
information on how to tune 10GbE (Intel 82599EB controller) NIC.

Currently I'm doing this on Solaris-11.4 CBE and later on will use the
same setup with Tribblix and OpenBSD.

I have two identical SPARC T4 servers. Each server has a dual-port
10GbE NIC and both ports are connected to a switch. I'm running
bidirectional send/receive performance test between the two servers,
utilising all ports at the same time. There is no link aggregation or
similar, each port is assigned a unique IP address and I'm running
multiple concurrent TCP data streams.

dladm show-phys
...
LINK            MEDIA         STATE      SPEED  DUPLEX    DEVICE
net14           Ethernet      up         10000  full      ixgbe1
net15           Ethernet      up         10000  full      ixgbe0

Each port is capable of 10Gb/sec send + 10Gb/sec receive data rates.
With dual ports on each server, maximum combined throughput should be
around 20Gb/sec + 20Gb/sec = 40Gb/sec. However, I'm getting around
30Gb/sec.

I enabled 9000 bytes jumbo frames on both servers and the switch. I
verified this with "ping -s -D -i <src_iface> <dest_ip> 8900" and all
looks OK. So I think the issue may be with imbalanced IRQ processing.

If I look at output of "mpstat -A core 1" I can see socket0 is
processing all IRQs but not socket1. The following confirms this:

I set each port to use CPUs across both sockets, but the "EFFECTIVE"
column shows the system selected CPUs 0-63 which are on socket0:

root@t4-node2:~# dladm show-linkprop -p cpus net14
LINK     PROPERTY        PERM VALUE        EFFECTIVE    DEFAULT   POSSIBLE
net14    cpus            rw   0-127        0-63         --        -- 

root@t4-node2:~# dladm show-linkprop -p cpus net15
LINK     PROPERTY        PERM VALUE        EFFECTIVE    DEFAULT   POSSIBLE
net15    cpus            rw   0-127        0-63         --        -- 

Interrupts are spread across various cores, but all are on socket0:

root@t4-node2:~# echo ::interrupts | mdb -k | grep ixgbe
PX#/Device       Type    MSG #   State INO    EQ Share    Pil    CPU Core chip
 0/ixgbe#1       MSI-X    468    enbl  0x39  0x33    1    6       14   1  0
 0/ixgbe#1       MSI-X    469    enbl  0x38  0x32    1    6       21   2  0
 0/ixgbe#1       MSI-X    470    enbl  0x37  0x31    1    6       38   4  0
 0/ixgbe#1       MSI-X    471    enbl  0x36  0x30    1    6       62   7  0
 0/ixgbe#1       MSI-X    472    enbl  0x35  0x2f    1    6       31   3  0
 0/ixgbe#0       MSI-X    460    enbl  0x34  0x2e    1    6       46   5  0
 0/ixgbe#0       MSI-X    461    enbl  0x33  0x2d    1    6       61   7  0
 0/ixgbe#0       MSI-X    458    enbl  0x32  0x2c    1    6       30   3  0
 0/ixgbe#0       MSI-X    459    enbl  0x31  0x2b    1    6       54   6  0
 0/ixgbe#0       MSI-X    457    enbl  0x30  0x2a    1    6       39   4  0
 0/ixgbe#1       MSI-X    477    enbl  0x28  0x22    2    6       20   2  0
 0/ixgbe#0       MSI-X    446    enbl  0x27  0x21    2    6       12   1  0
 0/ixgbe#1       MSI-X    473    enbl  0x24  0x1e    2    6       51   6  0
 0/ixgbe#1       MSI-X    474    enbl  0x23  0x1d    2    6       43   5  0
 0/ixgbe#1       MSI-X    475    enbl  0x22  0x1c    2    6       35   4  0
 0/ixgbe#0       MSI-X    455    enbl  0x1f  0x19    1    6       15   1  0
 0/ixgbe#1       MSI-X    463    enbl  0x1d  0x17    2    6       58   7  0
 0/ixgbe#1       MSI-X    476    enbl  0x17  0x11    2    6       10   1  0
 0/ixgbe#0       MSI-X    453    enbl  0x16  0x10    2    6        3   0  0
 0/ixgbe#0       MSI-X    450    enbl  0x14  0x0e    2    6       49   6  0
 0/ixgbe#0       MSI-X    452    enbl  0x13  0x0d    2    6       41   5  0
 0/ixgbe#0       MSI-X    447    enbl  0x12  0x0c    2    6       33   4  0
 0/ixgbe#0       MSI-X    451    enbl  0x11  0x0b    2    6       25   3  0
 0/ixgbe#0       MSI-X    448    enbl  0x10  0x0a    2    6       17   2  0
 0/ixgbe#0       MSI-X    454    enbl  0x0d  0x07    2    6       56   7  0
 0/ixgbe#1       MSI-X    465    enbl  0x0c  0x06    2    6       48   6  0
 0/ixgbe#1       MSI-X    466    enbl  0x0b  0x05    2    6       40   5  0
 0/ixgbe#1       MSI-X    464    enbl  0x0a  0x04    2    6       32   4  0
 0/ixgbe#1       MSI-X    467    enbl  0x09  0x03    2    6       24   3  0
 0/ixgbe#0       MSI-X    456    enbl  0x08  0x02    2    6       16   2  0
 0/ixgbe#1       MSI-X    462    enbl  0x07  0x01    2    6        8   1  0
 0/ixgbe#0       MSI-X    449    enbl  0x06  0x00    2    6        1   0  0

I've read some articles on the Internet which suggest Solaris "pcitool"
can be used to manually re-route interrupts. However, I'm not sure if
this is going to work. Maybe it is a design limitation, where each PCIe
card can only be serviced by interrupts from a specific socket. So on
SMP systems with multiple sockets, pcitool may not be effective?

I've also tried the following commands, in order to spread IRQ load
for each port on a different socket:

dladm set-linkprop -p cpus=0-63 net14
dladm set-linkprop -p cpus=64-127 net15

But this doesn't seem to be very effective. The "net15" device is
still using a large number of CPUs on socket0. The throughput
performance is much worse in this case.

Does anyone know if there is a way to have net14 IRQs on socket0
(cpus 0-63) and net15 IRQs on socket1 (cpus 64-127)? This could
spread the load more evenly and may improve data throughput.

I will try pcitool when I get some time in the next few days, but
I'm not very optimistic...

Thanks.

------------------------------------------
illumos: illumos-discuss
Permalink: 
https://illumos.topicbox.com/groups/discuss/Tffd21e95916338a2-M6e21c7749a0a33b05e027137
Delivery options: https://illumos.topicbox.com/groups/discuss/subscription

Reply via email to