Dear all,
 
I am trying to get to the bottom of a performance bottle neck that I am 
experiencing, that seems to be related to packet processing on my Linux router.
Sending TCP packets from Windows or MAC via my Ubuntu 16.04 router causes 
ksoftirq on my Linux router to eat up 100% CPU resources and cap the network 
throughput around 400Mbit.The reverse direction, or if the sending host is 
Linux on the same PC yields 900Mbit/s with little utilization on the Linux 
router.Using RSS and multiple TCP streams I could push 800Mbit/s through with 2 
cores fully utilized. This was as much as I could do on the Linux router to 
improve throughput. I understand the i210/11 only supports 2 receive queues.It 
seems to be related to Window size below the window size of 48k in iperf3 the 
CPU load stays low, afterwards it jumps to 100%
 
What is causing the ksoftirq CPU load in the sending direction coming from 
MAC/Windows?
I would assume some re-ordering of large packets?
 
Any pointers are very welcome!
I can provide pcaps, but I haven’t been able to spot any anomalies.
 
My test setup is as follows:
MAC / PC (Windows, Ubuntu Live) <--direct--> Linux router (3x intel i210) 
<---direct cable---> NAS
Linux router is a PCengine (AMD G series GX-412TC, 4x 1 GHz Jaguar core, 4GB 
RAM)
 
I have done multiple performance tests in different setups with iperf3 and 
identified that the high ksoftirq load on the Linux router only appears if the 
sender is Windows or MAC.
 
I have further successfully tested:
Direct connection of MAC/PC <-switch-> NAS can yield 900Mbit/s both 
directions.PC (Ubuntu Live) <-> Linux router <-> NAS can yield 900Mbit/s both 
directions.Speedtest.net MAC/PC -> Linux router -> Internet (1Gbit/s) using 
speedtest yields 500Mbit upload with ksoftirq on the Linux router at 100% CPU 
load. MAC can achieve 900Mbit UP/DOWN when connected directly.I have not been 
able to get UDP MAC/PC <-switch-> NAS above 500 Mbit/s so I have not further 
tested UDP due to the high error rate.FreeBSD on the Linux router is able 
sustain 900Mbit/s UP/DOWN with little loadNo effect on the unidirectional load:
Turning LRO/GRO on and offDisabled energy efficient EthernetDifferent Kernels 
(4.4.0.66 stock – tried various 3.6.xx and 4.8.x now running 4.10.2)IGB intel 
driver 5.3.0-k srcversion:  90ABA603B1D2A1415F2D301Different Linux congestion 
control algorithmInterruptThrottleRate on IGB enabled/disabledIncreased 
netdev.budget
 
lscpi -v :
02:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network Connection 
(rev 03)
        Subsystem: Intel Corporation I210 Gigabit Network Connection
        Flags: bus master, fast devsel, latency 0, IRQ 33
        Memory at fe600000 (32-bit, non-prefetchable) [size=128K]
        I/O ports at 1000 [size=32]
        Memory at fe620000 (32-bit, non-prefetchable) [size=16K]
        Capabilities: [40] Power Management version 3
        Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
        Capabilities: [70] MSI-X: Enable+ Count=5 Masked-
        Capabilities: [a0] Express Endpoint, MSI 00
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [140] Device Serial Number 00-0d-b9-xx-xx-xx-xx-xx
        Capabilities: [1a0] Transaction Processing Hints
        Kernel driver in use: igb
 
 
cat /proc/softirqs
                    CPU0       CPU1       CPU2       CPU3
          HI:          0          0          0          0
       TIMER:    4141656    3676729    8708615    3009349
      NET_TX:       8874      27600       5285       2013
      NET_RX:   11019938    9055919    2988197    3129611
       BLOCK:      45739      45952      46838      47832
    IRQ_POLL:          0          0          0          0
     TASKLET:    6303077    3902509    3918287    3987525
       SCHED:    1545958    1264065    4493394     798169
     HRTIMER:          0          0          0          0
         RCU:    2350434    2287515    3839434    1842554
 
cat /proc/interrupts
            CPU0       CPU1       CPU2       CPU3
  34:          1          1          0          0   PCI-MSI 1048576-edge      
enp2s0
  35:    9556594          4          4          3   PCI-MSI 1048577-edge      
enp2s0-TxRx-0
  36:          2    7584066          2          2   PCI-MSI 1048578-edge      
enp2s0-TxRx-1
  37:          0          2    1499447          3   PCI-MSI 1048579-edge      
enp2s0-TxRx-2
  38:          3          3          3    1657668   PCI-MSI 1048580-edge      
enp2s0-TxRx-3
 
ethtool -x enp2s0
RX flow hash indirection table for enp2s0 with 4 RX ring(s):
    0:      0     1     0     1     0     1     0     1
    8:      0     1     0     1     0     1     0     1
   16:      0     1     0     1     0     1     0     1
   24:      0     1     0     1     0     1     0     1
   32:      0     1     0     1     0     1     0     1
   40:      0     1     0     1     0     1     0     1
   48:      0     1     0     1     0     1     0     1
   56:      0     1     0     1     0     1     0     1
   64:      0     1     0     1     0     1     0     1
   72:      0     1     0     1     0     1     0     1
   80:      0     1     0     1     0     1     0     1
   88:      0     1     0     1     0     1     0     1
   96:      0     1     0     1     0     1     0     1
  104:      0     1     0     1     0     1     0     1
  112:      0     1     0     1     0     1     0     1
  120:      0     1     0     1     0     1     0     1

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit 
http://communities.intel.com/community/wired

Reply via email to