Hello, I'm doing some network performance testing and looking for information on how to tune 10GbE (Intel 82599EB controller) NIC.
Currently I'm doing this on Solaris-11.4 CBE and later on will use the same setup with Tribblix and OpenBSD. I have two identical SPARC T4 servers. Each server has a dual-port 10GbE NIC and both ports are connected to a switch. I'm running bidirectional send/receive performance test between the two servers, utilising all ports at the same time. There is no link aggregation or similar, each port is assigned a unique IP address and I'm running multiple concurrent TCP data streams. dladm show-phys ... LINK MEDIA STATE SPEED DUPLEX DEVICE net14 Ethernet up 10000 full ixgbe1 net15 Ethernet up 10000 full ixgbe0 Each port is capable of 10Gb/sec send + 10Gb/sec receive data rates. With dual ports on each server, maximum combined throughput should be around 20Gb/sec + 20Gb/sec = 40Gb/sec. However, I'm getting around 30Gb/sec. I enabled 9000 bytes jumbo frames on both servers and the switch. I verified this with "ping -s -D -i <src_iface> <dest_ip> 8900" and all looks OK. So I think the issue may be with imbalanced IRQ processing. If I look at output of "mpstat -A core 1" I can see socket0 is processing all IRQs but not socket1. The following confirms this: I set each port to use CPUs across both sockets, but the "EFFECTIVE" column shows the system selected CPUs 0-63 which are on socket0: root@t4-node2:~# dladm show-linkprop -p cpus net14 LINK PROPERTY PERM VALUE EFFECTIVE DEFAULT POSSIBLE net14 cpus rw 0-127 0-63 -- -- root@t4-node2:~# dladm show-linkprop -p cpus net15 LINK PROPERTY PERM VALUE EFFECTIVE DEFAULT POSSIBLE net15 cpus rw 0-127 0-63 -- -- Interrupts are spread across various cores, but all are on socket0: root@t4-node2:~# echo ::interrupts | mdb -k | grep ixgbe PX#/Device Type MSG # State INO EQ Share Pil CPU Core chip 0/ixgbe#1 MSI-X 468 enbl 0x39 0x33 1 6 14 1 0 0/ixgbe#1 MSI-X 469 enbl 0x38 0x32 1 6 21 2 0 0/ixgbe#1 MSI-X 470 enbl 0x37 0x31 1 6 38 4 0 0/ixgbe#1 MSI-X 471 enbl 0x36 0x30 1 6 62 7 0 0/ixgbe#1 MSI-X 472 enbl 0x35 0x2f 1 6 31 3 0 0/ixgbe#0 MSI-X 460 enbl 0x34 0x2e 1 6 46 5 0 0/ixgbe#0 MSI-X 461 enbl 0x33 0x2d 1 6 61 7 0 0/ixgbe#0 MSI-X 458 enbl 0x32 0x2c 1 6 30 3 0 0/ixgbe#0 MSI-X 459 enbl 0x31 0x2b 1 6 54 6 0 0/ixgbe#0 MSI-X 457 enbl 0x30 0x2a 1 6 39 4 0 0/ixgbe#1 MSI-X 477 enbl 0x28 0x22 2 6 20 2 0 0/ixgbe#0 MSI-X 446 enbl 0x27 0x21 2 6 12 1 0 0/ixgbe#1 MSI-X 473 enbl 0x24 0x1e 2 6 51 6 0 0/ixgbe#1 MSI-X 474 enbl 0x23 0x1d 2 6 43 5 0 0/ixgbe#1 MSI-X 475 enbl 0x22 0x1c 2 6 35 4 0 0/ixgbe#0 MSI-X 455 enbl 0x1f 0x19 1 6 15 1 0 0/ixgbe#1 MSI-X 463 enbl 0x1d 0x17 2 6 58 7 0 0/ixgbe#1 MSI-X 476 enbl 0x17 0x11 2 6 10 1 0 0/ixgbe#0 MSI-X 453 enbl 0x16 0x10 2 6 3 0 0 0/ixgbe#0 MSI-X 450 enbl 0x14 0x0e 2 6 49 6 0 0/ixgbe#0 MSI-X 452 enbl 0x13 0x0d 2 6 41 5 0 0/ixgbe#0 MSI-X 447 enbl 0x12 0x0c 2 6 33 4 0 0/ixgbe#0 MSI-X 451 enbl 0x11 0x0b 2 6 25 3 0 0/ixgbe#0 MSI-X 448 enbl 0x10 0x0a 2 6 17 2 0 0/ixgbe#0 MSI-X 454 enbl 0x0d 0x07 2 6 56 7 0 0/ixgbe#1 MSI-X 465 enbl 0x0c 0x06 2 6 48 6 0 0/ixgbe#1 MSI-X 466 enbl 0x0b 0x05 2 6 40 5 0 0/ixgbe#1 MSI-X 464 enbl 0x0a 0x04 2 6 32 4 0 0/ixgbe#1 MSI-X 467 enbl 0x09 0x03 2 6 24 3 0 0/ixgbe#0 MSI-X 456 enbl 0x08 0x02 2 6 16 2 0 0/ixgbe#1 MSI-X 462 enbl 0x07 0x01 2 6 8 1 0 0/ixgbe#0 MSI-X 449 enbl 0x06 0x00 2 6 1 0 0 I've read some articles on the Internet which suggest Solaris "pcitool" can be used to manually re-route interrupts. However, I'm not sure if this is going to work. Maybe it is a design limitation, where each PCIe card can only be serviced by interrupts from a specific socket. So on SMP systems with multiple sockets, pcitool may not be effective? I've also tried the following commands, in order to spread IRQ load for each port on a different socket: dladm set-linkprop -p cpus=0-63 net14 dladm set-linkprop -p cpus=64-127 net15 But this doesn't seem to be very effective. The "net15" device is still using a large number of CPUs on socket0. The throughput performance is much worse in this case. Does anyone know if there is a way to have net14 IRQs on socket0 (cpus 0-63) and net15 IRQs on socket1 (cpus 64-127)? This could spread the load more evenly and may improve data throughput. I will try pcitool when I get some time in the next few days, but I'm not very optimistic... Thanks. ------------------------------------------ illumos: illumos-discuss Permalink: https://illumos.topicbox.com/groups/discuss/Tffd21e95916338a2-M6e21c7749a0a33b05e027137 Delivery options: https://illumos.topicbox.com/groups/discuss/subscription