On Wed, Sep 12, 2012 at 05:10:44PM -0700, Jesse Brandeburg wrote:
> On Wed, 12 Sep 2012 22:47:55 +0200
> Dick Snippe <[email protected]> wrote:
>
> > On Wed, Sep 12, 2012 at 04:05:02PM +0000, Brandeburg, Jesse wrote:
> >
> > > Hi Dick, we need to know exactly what you are expecting to happen
> > > here.
> >
> > I'm surprised by the large increase in latency (from <1ms to >100ms).
> > In our production environment we see this phenomenon even on "moderate"
> > load, transmitting 1.5-2Gbit.
>
> I believe maybe you could be (I'd equivocate more if I could) seeing a
> bit of the "bufferbloat" effect maybe from the large queues available by
> default on the 10G interface.
>
> can you try running with smaller transmit descriptor rings?
> ethtool -G ethx tx 128
Not much difference:
|1000 packets transmitted, 1000 received, 0% packet loss, time 7386ms
|rtt min/avg/max/mdev = 48.522/76.642/93.488/6.404 ms, pipe 14
|Transfer rate: 168162.02 [Kbytes/sec] received
However, if I retry with "ifconfig ethx txqueuelen 10" latency
(not throughput) looks better:
|1000 packets transmitted, 987 received, 1% packet loss, time 5905ms
|rtt min/avg/max/mdev = 0.443/17.018/42.106/8.075 ms, pipe 7
|Transfer rate: 132776.78 [Kbytes/sec] received
> can you separately try running without other offloads? like:
> ethtool -K ethx lro off tso off gro off
Worse:
|1000 packets transmitted, 1000 received, 0% packet loss, time 7254ms
|rtt min/avg/max/mdev = 221.309/271.187/320.978/20.035 ms, pipe 44
|Transfer rate: 117764.62 [Kbytes/sec] received
> > This effect on 10G infrastructure appears to be much more pronounced
> > compared to 1G. When testing on 1G nics latency also increases, but much
> > less so; from <1ms to ~10ms. A difference is that the 1G nics are
> > saturated but the 10G ones are "only" transmitting ~1.5 Gbit.
>
> that is a very interesting data point. Are your 1G nics multi-queue?
We have both flavours. When testing on older, no-multiqueue nics
things look good: ( Broadcom Corporation NetXtreme II BCM5708S Gigabit
Ethernet (rev 12), bnx2 driver)
|--- igor06.omroep.nl ping statistics ---
|1000 packets transmitted, 1000 received, 0% packet loss, time 998ms
|rtt min/avg/max/mdev = 0.097/0.144/0.437/0.032 ms
newer multiqueue 1G nics appear to have the same problem:
(Broadcom Corporation NetXtreme II BCM5709S Gigabit Ethernet (rev 20),
bnx2 driver)
|--- dltest-intern ping statistics ---
|1000 packets transmitted, 1000 received, 0% packet loss, time 6503ms
|rtt min/avg/max/mdev = 82.486/96.982/105.990/5.457 ms, pipe 18
> > > There is a simple test you can do, try to disable TSO using
> > > ethtool. ethtool -K ethx tso off
> >
> > I just tried that. The results are very similar.
>
> hm, you aren't getting any flow control in your network are you? (see
> ethtool -S ethx)
no, I don't think so:
$ sudo ethtool -S eth1|grep flow_control
tx_flow_control_xon: 0
rx_flow_control_xon: 0
tx_flow_control_xoff: 0
rx_flow_control_xoff: 0
> and take a look at the other stats while you are there.
Output is attached, however it looks pretty unsuspicious to me.
> it also might be interesting to sniff the ethx interface to see
> the outbound traffic patterns and delays between ping request/reponse.
>
> start your test
> start the ping
> tcpdump -i ethx -s 128 -w snippetx.cap -c 1000
> bzip2 snippetx.cap
> <put on pastebin or some other web site and email us link>
http://download.omroep.nl/gurus/dick/ixgbe/snippet1.cap.gz
explanation:
morsa01: host running the webserver, although the actual webserver
uses a different ip address: dltest.omroep.nl
morsa02: host running ab ("the client")
morsa03: host running ping to dltest.omroep.nl
Longest ping rtt's in this dump appear to be icmp seq 201, 173 and 181
each ~120ms:
13:38:45.302927 IP morsa03.omroep.nl > dltest1afp.omroep.nl: ICMP echo request,
id 55334, seq 173, length 64
13:38:45.367699 IP morsa03.omroep.nl > dltest1afp.omroep.nl: ICMP echo request,
id 55334, seq 181, length 64
13:38:45.424599 IP dltest1afp.omroep.nl > morsa03.omroep.nl: ICMP echo reply,
id 55334, seq 173, length 64
13:38:45.488640 IP dltest1afp.omroep.nl > morsa03.omroep.nl: ICMP echo reply,
id 55334, seq 181, length 64
13:38:45.516720 IP morsa03.omroep.nl > dltest1afp.omroep.nl: ICMP echo request,
id 55334, seq 201, length 64
13:38:45.639179 IP dltest1afp.omroep.nl > morsa03.omroep.nl: ICMP echo reply,
id 55334, seq 201, length 64
> > > If that helps then we know that we need to pursue ways to get
> > > your high priority traffic onto its own queue, which btw is why the
> > > single thread iperf works. Ping goes to a different queue (by luck)
> > > and gets out sooner due to not being behind other traffic
> >
> > Interestingly multi threaded iperf (iperf -P 50) manages to do +/-
> > 7.5Gbit while ping latency is still around 0.1 - 0.3 ms.
>
> Thats only interesting if you're using all 16 queues, were you?
I'm not sure. How can I check how many queue's I'm using?
> There are some games here with the scheduler and NIC irq affinity as
> well that might be impacting us. Can you please make sure you killall
> irqbalance, and run set_irq_affinity.sh ethx ethy. The goal here is to
> start eliminating latency causes.
irqbalance is not running on our servers
the set_irq_affinity.sh sets the affinity identical to our default setup
in which we set the affinity according to /proc/irq/XX/affinity_hint
> I'd also be curious what your
> interrupts per second per queue are during your workload.
$ awk '/eth1/ {print $1,$19}' /proc/interrupt
83: eth1-TxRx-0
84: eth1-TxRx-1
85: eth1-TxRx-2
86: eth1-TxRx-3
87: eth1-TxRx-4
88: eth1-TxRx-5
89: eth1-TxRx-6
90: eth1-TxRx-7
91: eth1-TxRx-8
92: eth1-TxRx-9
93: eth1-TxRx-10
94: eth1-TxRx-11
95: eth1-TxRx-12
96: eth1-TxRx-13
97: eth1-TxRx-14
98: eth1-TxRx-15
99: eth1
$ sar -I 83,84,85,86,97,88,89,90,91,92,93,94,95,96,97,98 1 11111
14:09:19 INTR intr/s
14:09:20 83 3431.00
14:09:20 84 3387.00
14:09:20 85 3392.00
14:09:20 86 3352.00
14:09:20 88 3403.00
14:09:20 89 3380.00
14:09:20 90 3408.00
14:09:20 91 3418.00
14:09:20 92 3380.00
14:09:20 93 3388.00
14:09:20 94 3379.00
14:09:20 95 3435.00
14:09:20 96 3412.00
14:09:20 97 3359.00
14:09:20 98 3429.00
> Lastly, I'm headed out on vacation tonight and won't be available for a
> while. I hope that someone else on my team will continue to work with
> you to debug what is going on.
Hava a nice vacation!
If someone els could help me with this issue, that would be great.
> Maybe someone here can reproduce the issue and we will make much more
> progress. Any testing details like kernel version, driver version, etc
> will be helpful.
$ uname -r
3.5.3-2POi-x86_64 (we compile our own kernels, this is a vanilla
kernel.org kernel; /proc/config.gz attached)
$ sudo ethtool -i eth1
driver: ixgbe
version: 3.9.15-k
firmware-version: 0x613e0001
bus-info: 0000:15:00.1
--
Dick Snippe, internetbeheerder \ fight war
[email protected], +31 35 677 3555 \ not wars
NPO ICT, Sumatralaan 45, 1217 GP Hilversum, NPO Gebouw A
NIC statistics:
rx_packets: 4954207
tx_packets: 14411754
rx_bytes: 297725278
tx_bytes: 21803837466
rx_pkts_nic: 4954200
tx_pkts_nic: 14411748
rx_bytes_nic: 317541462
tx_bytes_nic: 21861493872
lsc_int: 0
tx_busy: 0
non_eop_descs: 0
rx_errors: 0
tx_errors: 0
rx_dropped: 13
tx_dropped: 0
multicast: 0
broadcast: 749
rx_no_buffer_count: 0
collisions: 0
rx_over_errors: 0
rx_crc_errors: 0
rx_frame_errors: 0
hw_rsc_aggregated: 0
hw_rsc_flushed: 0
fdir_match: 4948157
fdir_miss: 5413
fdir_overflow: 0
rx_fifo_errors: 0
rx_missed_errors: 0
tx_aborted_errors: 0
tx_carrier_errors: 0
tx_fifo_errors: 0
tx_heartbeat_errors: 0
tx_timeout_count: 0
tx_restart_queue: 13470
rx_long_length_errors: 0
rx_short_length_errors: 0
tx_flow_control_xon: 0
rx_flow_control_xon: 0
tx_flow_control_xoff: 0
rx_flow_control_xoff: 0
rx_csum_offload_errors: 0
alloc_rx_page_failed: 0
alloc_rx_buff_failed: 0
rx_no_dma_resources: 0
os2bmc_rx_by_bmc: 0
os2bmc_tx_by_bmc: 0
os2bmc_tx_by_host: 0
os2bmc_rx_by_host: 0
tx_queue_0_packets: 1107735
tx_queue_0_bytes: 1676513980
tx_queue_1_packets: 804253
tx_queue_1_bytes: 1217059504
tx_queue_2_packets: 869850
tx_queue_2_bytes: 1316611613
tx_queue_3_packets: 968752
tx_queue_3_bytes: 1464607596
tx_queue_4_packets: 1045184
tx_queue_4_bytes: 1582011203
tx_queue_5_packets: 916924
tx_queue_5_bytes: 1387753105
tx_queue_6_packets: 656152
tx_queue_6_bytes: 992836741
tx_queue_7_packets: 1181526
tx_queue_7_bytes: 1788412206
tx_queue_8_packets: 721098
tx_queue_8_bytes: 1090527918
tx_queue_9_packets: 796443
tx_queue_9_bytes: 1205448460
tx_queue_10_packets: 856687
tx_queue_10_bytes: 1296162339
tx_queue_11_packets: 939584
tx_queue_11_bytes: 1422224966
tx_queue_12_packets: 936446
tx_queue_12_bytes: 1417411339
tx_queue_13_packets: 796627
tx_queue_13_bytes: 1205727747
tx_queue_14_packets: 1032075
tx_queue_14_bytes: 1556381023
tx_queue_15_packets: 782418
tx_queue_15_bytes: 1184147726
tx_queue_16_packets: 0
tx_queue_16_bytes: 0
tx_queue_17_packets: 0
tx_queue_17_bytes: 0
tx_queue_18_packets: 0
tx_queue_18_bytes: 0
tx_queue_19_packets: 0
tx_queue_19_bytes: 0
tx_queue_20_packets: 0
tx_queue_20_bytes: 0
tx_queue_21_packets: 0
tx_queue_21_bytes: 0
tx_queue_22_packets: 0
tx_queue_22_bytes: 0
tx_queue_23_packets: 0
tx_queue_23_bytes: 0
rx_queue_0_packets: 355318
rx_queue_0_bytes: 21331740
rx_queue_1_packets: 288498
rx_queue_1_bytes: 17321991
rx_queue_2_packets: 316116
rx_queue_2_bytes: 18980362
rx_queue_3_packets: 313977
rx_queue_3_bytes: 18869388
rx_queue_4_packets: 326359
rx_queue_4_bytes: 19598065
rx_queue_5_packets: 295157
rx_queue_5_bytes: 17722555
rx_queue_6_packets: 230605
rx_queue_6_bytes: 13848645
rx_queue_7_packets: 335184
rx_queue_7_bytes: 20124361
rx_queue_8_packets: 283886
rx_queue_8_bytes: 17112428
rx_queue_9_packets: 316700
rx_queue_9_bytes: 19016728
rx_queue_10_packets: 316911
rx_queue_10_bytes: 19055380
rx_queue_11_packets: 323517
rx_queue_11_bytes: 19421215
rx_queue_12_packets: 332939
rx_queue_12_bytes: 19986568
rx_queue_13_packets: 282696
rx_queue_13_bytes: 16978334
rx_queue_14_packets: 354728
rx_queue_14_bytes: 21451622
rx_queue_15_packets: 281616
rx_queue_15_bytes: 16905896
rx_queue_16_packets: 0
rx_queue_16_bytes: 0
rx_queue_17_packets: 0
rx_queue_17_bytes: 0
rx_queue_18_packets: 0
rx_queue_18_bytes: 0
rx_queue_19_packets: 0
rx_queue_19_bytes: 0
rx_queue_20_packets: 0
rx_queue_20_bytes: 0
rx_queue_21_packets: 0
rx_queue_21_bytes: 0
rx_queue_22_packets: 0
rx_queue_22_bytes: 0
rx_queue_23_packets: 0
rx_queue_23_bytes: 0
tx_pb_0_pxon: 0
tx_pb_0_pxoff: 0
tx_pb_1_pxon: 0
tx_pb_1_pxoff: 0
tx_pb_2_pxon: 0
tx_pb_2_pxoff: 0
tx_pb_3_pxon: 0
tx_pb_3_pxoff: 0
tx_pb_4_pxon: 0
tx_pb_4_pxoff: 0
tx_pb_5_pxon: 0
tx_pb_5_pxoff: 0
tx_pb_6_pxon: 0
tx_pb_6_pxoff: 0
tx_pb_7_pxon: 0
tx_pb_7_pxoff: 0
rx_pb_0_pxon: 0
rx_pb_0_pxoff: 0
rx_pb_1_pxon: 0
rx_pb_1_pxoff: 0
rx_pb_2_pxon: 0
rx_pb_2_pxoff: 0
rx_pb_3_pxon: 0
rx_pb_3_pxoff: 0
rx_pb_4_pxon: 0
rx_pb_4_pxoff: 0
rx_pb_5_pxon: 0
rx_pb_5_pxoff: 0
rx_pb_6_pxon: 0
rx_pb_6_pxoff: 0
rx_pb_7_pxon: 0
rx_pb_7_pxoff: 0
------------------------------------------------------------------------------
Got visibility?
Most devs has no idea what their production app looks like.
Find out how fast your code is with AppDynamics Lite.
http://ad.doubleclick.net/clk;262219671;13503038;y?
http://info.appdynamics.com/FreeJavaPerformanceDownload.html
_______________________________________________
E1000-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel® Ethernet, visit
http://communities.intel.com/community/wired