On Wed, Sep 12, 2012 at 05:10:44PM -0700, Jesse Brandeburg wrote:

> On Wed, 12 Sep 2012 22:47:55 +0200
> Dick Snippe <[email protected]> wrote:
> 
> > On Wed, Sep 12, 2012 at 04:05:02PM +0000, Brandeburg, Jesse wrote:
> > 
> > > Hi Dick, we need to know exactly what you are expecting to happen
> > > here.
> > 
> > I'm surprised by the large increase in latency (from <1ms to >100ms).
> > In our production environment we see this phenomenon even on "moderate"
> > load, transmitting 1.5-2Gbit.
> 
> I believe maybe you could be (I'd equivocate more if I could) seeing a
> bit of the "bufferbloat" effect maybe from the large queues available by
> default on the 10G interface.
> 
> can you try running with smaller transmit descriptor rings?
> ethtool -G ethx tx 128

Not much difference:
|1000 packets transmitted, 1000 received, 0% packet loss, time 7386ms
|rtt min/avg/max/mdev = 48.522/76.642/93.488/6.404 ms, pipe 14
|Transfer rate:          168162.02 [Kbytes/sec] received

However, if I retry with "ifconfig ethx txqueuelen 10" latency
(not throughput) looks better:
|1000 packets transmitted, 987 received, 1% packet loss, time 5905ms
|rtt min/avg/max/mdev = 0.443/17.018/42.106/8.075 ms, pipe 7
|Transfer rate:          132776.78 [Kbytes/sec] received


> can you separately try running without other offloads?  like:
> ethtool -K ethx lro off tso off gro off

Worse:
|1000 packets transmitted, 1000 received, 0% packet loss, time 7254ms
|rtt min/avg/max/mdev = 221.309/271.187/320.978/20.035 ms, pipe 44
|Transfer rate:          117764.62 [Kbytes/sec] received

> > This effect on 10G infrastructure appears to be much more pronounced 
> > compared to 1G. When testing on 1G nics latency also increases, but much
> > less so; from <1ms to ~10ms. A difference is that the 1G nics are
> > saturated but the 10G ones are "only" transmitting ~1.5 Gbit.
> 
> that is a very interesting data point.  Are your 1G nics multi-queue?

We have both flavours. When testing on older, no-multiqueue nics
things look good: ( Broadcom Corporation NetXtreme II BCM5708S Gigabit
Ethernet (rev 12), bnx2 driver)

|--- igor06.omroep.nl ping statistics ---
|1000 packets transmitted, 1000 received, 0% packet loss, time 998ms
|rtt min/avg/max/mdev = 0.097/0.144/0.437/0.032 ms

newer multiqueue 1G nics appear to have the same problem:
(Broadcom Corporation NetXtreme II BCM5709S Gigabit Ethernet (rev 20),
bnx2 driver)

|--- dltest-intern ping statistics ---
|1000 packets transmitted, 1000 received, 0% packet loss, time 6503ms
|rtt min/avg/max/mdev = 82.486/96.982/105.990/5.457 ms, pipe 18

> > > There is a simple test you can do, try to disable TSO using
> > > ethtool. ethtool -K ethx tso off
> > 
> > I just tried that. The results are very similar.
> 
> hm, you aren't getting any flow control in your network are you?  (see
> ethtool -S ethx)

no, I don't think so:
$ sudo ethtool -S eth1|grep flow_control
     tx_flow_control_xon: 0
     rx_flow_control_xon: 0
     tx_flow_control_xoff: 0
     rx_flow_control_xoff: 0


> and take a look at the other stats while you are there.

Output is attached, however it looks pretty unsuspicious to me.


> it also might be interesting to sniff the ethx interface to see
> the outbound traffic patterns and delays between ping request/reponse.
> 
> start your test
> start the ping
> tcpdump -i ethx -s 128 -w snippetx.cap -c 1000
> bzip2 snippetx.cap
> <put on pastebin or some other web site and email us link>

http://download.omroep.nl/gurus/dick/ixgbe/snippet1.cap.gz 

explanation:
morsa01: host running the webserver, although the actual webserver 
        uses a different ip address: dltest.omroep.nl
morsa02: host running ab ("the client")
morsa03: host running ping to dltest.omroep.nl

Longest ping rtt's in this dump appear to be icmp seq 201, 173 and 181
each ~120ms:
13:38:45.302927 IP morsa03.omroep.nl > dltest1afp.omroep.nl: ICMP echo request, 
id 55334, seq 173, length 64
13:38:45.367699 IP morsa03.omroep.nl > dltest1afp.omroep.nl: ICMP echo request, 
id 55334, seq 181, length 64
13:38:45.424599 IP dltest1afp.omroep.nl > morsa03.omroep.nl: ICMP echo reply, 
id 55334, seq 173, length 64
13:38:45.488640 IP dltest1afp.omroep.nl > morsa03.omroep.nl: ICMP echo reply, 
id 55334, seq 181, length 64
13:38:45.516720 IP morsa03.omroep.nl > dltest1afp.omroep.nl: ICMP echo request, 
id 55334, seq 201, length 64
13:38:45.639179 IP dltest1afp.omroep.nl > morsa03.omroep.nl: ICMP echo reply, 
id 55334, seq 201, length 64

> > > If that helps then we know that we need to pursue ways to get
> > > your high priority traffic onto its own queue, which btw is why the
> > > single thread iperf works. Ping goes to a different queue (by luck)
> > > and gets out sooner due to not being behind other traffic
> > 
> > Interestingly multi threaded iperf (iperf -P 50) manages to do +/-
> > 7.5Gbit while ping latency is still around 0.1 - 0.3 ms.
> 
> Thats only interesting if you're using all 16 queues, were you?

I'm not sure. How can I check how many queue's I'm using?

> There are some games here with the scheduler and NIC irq affinity as
> well that might be impacting us.  Can you please make sure you killall
> irqbalance, and run set_irq_affinity.sh ethx ethy.  The goal here is to
> start eliminating latency causes.

irqbalance is not running on our servers
the set_irq_affinity.sh sets the affinity identical to our default setup
in which we set the affinity according to /proc/irq/XX/affinity_hint

>  I'd also be curious what your
> interrupts per second per queue are during your workload.

$ awk '/eth1/ {print $1,$19}' /proc/interrupt
83: eth1-TxRx-0
84: eth1-TxRx-1
85: eth1-TxRx-2
86: eth1-TxRx-3
87: eth1-TxRx-4
88: eth1-TxRx-5
89: eth1-TxRx-6
90: eth1-TxRx-7
91: eth1-TxRx-8
92: eth1-TxRx-9
93: eth1-TxRx-10
94: eth1-TxRx-11
95: eth1-TxRx-12
96: eth1-TxRx-13
97: eth1-TxRx-14
98: eth1-TxRx-15
99: eth1

$ sar -I 83,84,85,86,97,88,89,90,91,92,93,94,95,96,97,98 1 11111
14:09:19         INTR    intr/s
14:09:20           83   3431.00
14:09:20           84   3387.00
14:09:20           85   3392.00
14:09:20           86   3352.00
14:09:20           88   3403.00
14:09:20           89   3380.00
14:09:20           90   3408.00
14:09:20           91   3418.00
14:09:20           92   3380.00
14:09:20           93   3388.00
14:09:20           94   3379.00
14:09:20           95   3435.00
14:09:20           96   3412.00
14:09:20           97   3359.00
14:09:20           98   3429.00

> Lastly, I'm headed out on vacation tonight and won't be available for a
> while.  I hope that someone else on my team will continue to work with
> you to debug what is going on.

Hava a nice vacation!
If someone els could help me with this issue, that would be great.
 
> Maybe someone here can reproduce the issue and we will make much more
> progress.  Any testing details like kernel version, driver version, etc
> will be helpful.

$ uname -r
3.5.3-2POi-x86_64               (we compile our own kernels, this is a vanilla
                                kernel.org kernel; /proc/config.gz attached)
$ sudo ethtool -i eth1
driver: ixgbe
version: 3.9.15-k
firmware-version: 0x613e0001
bus-info: 0000:15:00.1

-- 
Dick Snippe, internetbeheerder     \ fight war
[email protected], +31 35 677 3555   \ not wars
NPO ICT, Sumatralaan 45, 1217 GP Hilversum, NPO Gebouw A
NIC statistics:
     rx_packets: 4954207
     tx_packets: 14411754
     rx_bytes: 297725278
     tx_bytes: 21803837466
     rx_pkts_nic: 4954200
     tx_pkts_nic: 14411748
     rx_bytes_nic: 317541462
     tx_bytes_nic: 21861493872
     lsc_int: 0
     tx_busy: 0
     non_eop_descs: 0
     rx_errors: 0
     tx_errors: 0
     rx_dropped: 13
     tx_dropped: 0
     multicast: 0
     broadcast: 749
     rx_no_buffer_count: 0
     collisions: 0
     rx_over_errors: 0
     rx_crc_errors: 0
     rx_frame_errors: 0
     hw_rsc_aggregated: 0
     hw_rsc_flushed: 0
     fdir_match: 4948157
     fdir_miss: 5413
     fdir_overflow: 0
     rx_fifo_errors: 0
     rx_missed_errors: 0
     tx_aborted_errors: 0
     tx_carrier_errors: 0
     tx_fifo_errors: 0
     tx_heartbeat_errors: 0
     tx_timeout_count: 0
     tx_restart_queue: 13470
     rx_long_length_errors: 0
     rx_short_length_errors: 0
     tx_flow_control_xon: 0
     rx_flow_control_xon: 0
     tx_flow_control_xoff: 0
     rx_flow_control_xoff: 0
     rx_csum_offload_errors: 0
     alloc_rx_page_failed: 0
     alloc_rx_buff_failed: 0
     rx_no_dma_resources: 0
     os2bmc_rx_by_bmc: 0
     os2bmc_tx_by_bmc: 0
     os2bmc_tx_by_host: 0
     os2bmc_rx_by_host: 0
     tx_queue_0_packets: 1107735
     tx_queue_0_bytes: 1676513980
     tx_queue_1_packets: 804253
     tx_queue_1_bytes: 1217059504
     tx_queue_2_packets: 869850
     tx_queue_2_bytes: 1316611613
     tx_queue_3_packets: 968752
     tx_queue_3_bytes: 1464607596
     tx_queue_4_packets: 1045184
     tx_queue_4_bytes: 1582011203
     tx_queue_5_packets: 916924
     tx_queue_5_bytes: 1387753105
     tx_queue_6_packets: 656152
     tx_queue_6_bytes: 992836741
     tx_queue_7_packets: 1181526
     tx_queue_7_bytes: 1788412206
     tx_queue_8_packets: 721098
     tx_queue_8_bytes: 1090527918
     tx_queue_9_packets: 796443
     tx_queue_9_bytes: 1205448460
     tx_queue_10_packets: 856687
     tx_queue_10_bytes: 1296162339
     tx_queue_11_packets: 939584
     tx_queue_11_bytes: 1422224966
     tx_queue_12_packets: 936446
     tx_queue_12_bytes: 1417411339
     tx_queue_13_packets: 796627
     tx_queue_13_bytes: 1205727747
     tx_queue_14_packets: 1032075
     tx_queue_14_bytes: 1556381023
     tx_queue_15_packets: 782418
     tx_queue_15_bytes: 1184147726
     tx_queue_16_packets: 0
     tx_queue_16_bytes: 0
     tx_queue_17_packets: 0
     tx_queue_17_bytes: 0
     tx_queue_18_packets: 0
     tx_queue_18_bytes: 0
     tx_queue_19_packets: 0
     tx_queue_19_bytes: 0
     tx_queue_20_packets: 0
     tx_queue_20_bytes: 0
     tx_queue_21_packets: 0
     tx_queue_21_bytes: 0
     tx_queue_22_packets: 0
     tx_queue_22_bytes: 0
     tx_queue_23_packets: 0
     tx_queue_23_bytes: 0
     rx_queue_0_packets: 355318
     rx_queue_0_bytes: 21331740
     rx_queue_1_packets: 288498
     rx_queue_1_bytes: 17321991
     rx_queue_2_packets: 316116
     rx_queue_2_bytes: 18980362
     rx_queue_3_packets: 313977
     rx_queue_3_bytes: 18869388
     rx_queue_4_packets: 326359
     rx_queue_4_bytes: 19598065
     rx_queue_5_packets: 295157
     rx_queue_5_bytes: 17722555
     rx_queue_6_packets: 230605
     rx_queue_6_bytes: 13848645
     rx_queue_7_packets: 335184
     rx_queue_7_bytes: 20124361
     rx_queue_8_packets: 283886
     rx_queue_8_bytes: 17112428
     rx_queue_9_packets: 316700
     rx_queue_9_bytes: 19016728
     rx_queue_10_packets: 316911
     rx_queue_10_bytes: 19055380
     rx_queue_11_packets: 323517
     rx_queue_11_bytes: 19421215
     rx_queue_12_packets: 332939
     rx_queue_12_bytes: 19986568
     rx_queue_13_packets: 282696
     rx_queue_13_bytes: 16978334
     rx_queue_14_packets: 354728
     rx_queue_14_bytes: 21451622
     rx_queue_15_packets: 281616
     rx_queue_15_bytes: 16905896
     rx_queue_16_packets: 0
     rx_queue_16_bytes: 0
     rx_queue_17_packets: 0
     rx_queue_17_bytes: 0
     rx_queue_18_packets: 0
     rx_queue_18_bytes: 0
     rx_queue_19_packets: 0
     rx_queue_19_bytes: 0
     rx_queue_20_packets: 0
     rx_queue_20_bytes: 0
     rx_queue_21_packets: 0
     rx_queue_21_bytes: 0
     rx_queue_22_packets: 0
     rx_queue_22_bytes: 0
     rx_queue_23_packets: 0
     rx_queue_23_bytes: 0
     tx_pb_0_pxon: 0
     tx_pb_0_pxoff: 0
     tx_pb_1_pxon: 0
     tx_pb_1_pxoff: 0
     tx_pb_2_pxon: 0
     tx_pb_2_pxoff: 0
     tx_pb_3_pxon: 0
     tx_pb_3_pxoff: 0
     tx_pb_4_pxon: 0
     tx_pb_4_pxoff: 0
     tx_pb_5_pxon: 0
     tx_pb_5_pxoff: 0
     tx_pb_6_pxon: 0
     tx_pb_6_pxoff: 0
     tx_pb_7_pxon: 0
     tx_pb_7_pxoff: 0
     rx_pb_0_pxon: 0
     rx_pb_0_pxoff: 0
     rx_pb_1_pxon: 0
     rx_pb_1_pxoff: 0
     rx_pb_2_pxon: 0
     rx_pb_2_pxoff: 0
     rx_pb_3_pxon: 0
     rx_pb_3_pxoff: 0
     rx_pb_4_pxon: 0
     rx_pb_4_pxoff: 0
     rx_pb_5_pxon: 0
     rx_pb_5_pxoff: 0
     rx_pb_6_pxon: 0
     rx_pb_6_pxoff: 0
     rx_pb_7_pxon: 0
     rx_pb_7_pxoff: 0
------------------------------------------------------------------------------
Got visibility?
Most devs has no idea what their production app looks like.
Find out how fast your code is with AppDynamics Lite.
http://ad.doubleclick.net/clk;262219671;13503038;y?
http://info.appdynamics.com/FreeJavaPerformanceDownload.html
_______________________________________________
E1000-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit 
http://communities.intel.com/community/wired

Reply via email to