Re: [E1000-devel] I350 high interrupts with a lot of traffic on 3.10 or 3.14 kernel

Alexander Duyck Sat, 15 Nov 2014 12:34:01 -0800

There could be a few causes for the number of interrupts to change. 
Either there was a change in the interrupt moderation scheme in use, or
the driver is simply processing packets faster and exiting polling more
frequently.


To test for a difference in interrupt moderation I would recommend using
ethtool -C <iface> rx-usecs 400.  That should lock the interface in at
2500 interrupts per second.  You should be able to do this on either
kernel to determine if the difference is interrupt moderation.  Other
than that you might try using "perf top" like I mentioned to see where
the hot spots are in the old kernel versus the new one.

- Alex

On 11/15/2014 06:03 AM, Mike Zupan wrote:
> Alexander
>
> Thanks for the reply and clear up.. Looks like I’m doing 4-5x the
> number of interrupts
>
> Like for example 130 is eth2-TxRx-0 an 112 is eth0-TxRx-0
>
> There are 2 bonds on this host. One is to external network and the
> other is for the internal network with a total of 4 Nics. 
>
> 3.10 kernel
>
> 05:58:42 AM      INTR    intr/s
> 05:58:46 AM       104      0.25
> 05:58:46 AM       105      0.25
> 05:58:46 AM       106      0.25
> 05:58:46 AM       107      0.25
> 05:58:46 AM       108      0.25
> 05:58:46 AM       112   4866.25
> 05:58:46 AM       113   5007.50
> 05:58:46 AM       114   4891.75
> 05:58:46 AM       115   5171.00
> 05:58:46 AM       116   4894.00
> 05:58:46 AM       118   5253.75
> 05:58:46 AM       119   4986.00
> 05:58:46 AM       121      3.50
> 05:58:46 AM       122      6.00
> 05:58:46 AM       123      3.75
> 05:58:46 AM       124      1.25
> 05:58:46 AM       125      2.25
> 05:58:46 AM       126      2.00
> 05:58:46 AM       127      1.00
> 05:58:46 AM       128      1.25
> 05:58:46 AM       130   8547.25
> 05:58:46 AM       131   8671.50
> 05:58:46 AM       132   8620.50
> 05:58:46 AM       133   8864.00
> 05:58:46 AM       134   8508.25
> 05:58:46 AM       135   8597.25
> 05:58:46 AM       136   8742.75
> 05:58:46 AM       137   8536.25
> 05:58:46 AM       139      6.00
> 05:58:46 AM       140      6.25
> 05:58:46 AM       141      6.50
> 05:58:46 AM       142      1.75
> 05:58:46 AM       143      2.75
> 05:58:46 AM       144      1.50
> 05:58:46 AM       145      2.00
> 05:58:46 AM       146      6.25
>
>
> 2.6 kernel
>
> 05:58:38 AM      INTR    intr/s
> 05:58:42 AM        50    203.27
> 05:58:42 AM        82   2505.54
> 05:58:42 AM        83   2731.99
> 05:58:42 AM        84   2586.65
> 05:58:42 AM        85   2565.99
> 05:58:42 AM        86   2078.34
> 05:58:42 AM        87   2351.89
> 05:58:42 AM        88   2270.03
> 05:58:42 AM        89   2579.09
> 05:58:42 AM        91     94.71
> 05:58:42 AM        92     31.49
> 05:58:42 AM        93     37.28
> 05:58:42 AM        94     42.32
> 05:58:42 AM        95     32.24
> 05:58:42 AM        96     30.73
> 05:58:42 AM        97     39.04
> 05:58:42 AM        98     48.61
> 05:58:42 AM       100   2949.87
> 05:58:42 AM       101   3349.12
> 05:58:42 AM       102   3233.00
> 05:58:42 AM       103   2839.55
> 05:58:42 AM       105   2912.09
> 05:58:42 AM       106   2672.29
> 05:58:42 AM       107   2996.98
> 05:58:42 AM       109     91.69
> 05:58:42 AM       110     48.11
> 05:58:42 AM       111     42.32
> 05:58:42 AM       112     46.60
> 05:58:42 AM       113     46.35
> 05:58:42 AM       114     53.15
> 05:58:42 AM       115     52.90
> 05:58:42 AM       116     43.83
>
>
>
> -- 
> Mike Zupan
>
> On Friday, November 14, 2014 at 8:54 PM, Alexander Duyck wrote:
>
>> On 11/13/2014 11:13 AM, Mike Zupan wrote:
>>> I’m having a strange issue doing on with 3.10 or 3.17 kernel that
>>> I’m not seeing with 2.6. We are seeing a lot of softirq requests for
>>> network cards even on a mostly idle system. It happens on any server
>>> in the cluster if I deploy the 3.10 or 3.17 kernel
>>>
>>> Using top we noticed this process using a lot of CPU. As soon as I
>>> give the server traffic load spikes to well over 200 for a 1 min
>>> average.
>>>
>>> [kworker/u66:2]
>>>
>>> That lead us to install `powertop` and then saw this
>>>
>>> Usage Events/s Category Description
>>> 1110 ms/s 2045.2 Process php-fpm: pool www
>>> 36.0 ms/s 2165.4 Timer tick_sched_timer
>>> 57.7 ms/s 1285.0 Process nginx: worker process
>>> 13.3 ms/s 416.0 Timer hrtimer_wakeup
>>> 39.1 ms/s 350.7 Interrupt [3] net_rx(softirq)
>>>
>>> This is the same on a 2.6 series getting the same amount of traffic
>>>
>>> Usage Events/s Category Description
>>> 1795 ms/s 1654.0 Process php-fpm: pool www
>>> 45.3 ms/s 1110.4 Process nginx: worker process
>>> 562.8 µs/s 122.4 Process /usr/bin/java -Xms200m -Xmx2000m -Xss256k
>>> -XX:MaxDirectMemorySize=516m -XX:+UseParNewGC
>>> -XX:+UseConcMarkSweepGC -Dage
>>> 497.1 µs/s 59.3 Process /usr/sbin/gmond
>>> 16.0 ms/s 30.2 Process /usr/bin/redis-server 127.0.0.1:6379
>>> 4.7 ms/s 32.8 Process python /usr/bin/statsd-relay.py
>>> 81.7 ms/s 0.00 Timer tcp_delack_timer
>>> 24.8 ms/s 0.00 Timer tick_sched_timer
>>> 549.4 µs/s 9.2 Process java -Xmx6g -server -Dfile.encoding=utf-8
>>> -XX:OnOutOfMemoryError=kill -9 %p -XX:+HeapDumpOnOutOfMemoryError
>>> -XX:HeapD
>>> 15.2 ms/s 0.00 Interrupt [3] net_rx(softirq)
>>>
>>>
>>> As you can see the net_rx is 0 on 2.6 but we get as many as 4k/s on
>>> 3.10. The server specs are the same and removed all sysctl settings.
>>> I can replicate the issue just by installing 3.10 on a server.
>>>
>>> the nics we have in are
>>>
>>> 06:00.0 Ethernet controller: Intel Corporation I350 Gigabit Network
>>> Connection (rev 01)
>>> 06:00.1 Ethernet controller: Intel Corporation I350 Gigabit Network
>>> Connection (rev 01)
>>> 06:00.2 Ethernet controller: Intel Corporation I350 Gigabit Network
>>> Connection (rev 01)
>>> 06:00.3 Ethernet controller: Intel Corporation I350 Gigabit Network
>>> Connection (rev 01)
>>>
>>> -- 
>>> Mike Zupan
>>
>> Mike,
>>
>> I would recommend installing the "perf" tool and running "perf top"
>> instead of "powertop" to try and determine what is running on your
>> system. The powertop tool is meant to determine what is waking you up
>> out of sleep states, not what is actually making use of the system. As
>> such with powertop you could see 0 events per second and all that would
>> mean is that the system isn't getting to sleep as it is too busy, which
>> a high count could actually mean your system is going idle resulting in
>> a significant number of wake-ups.
>>
>> For interrupt information you might try watching the rate at which
>> /proc/interrupts increases or you could install sysstat and then run
>> "sar -I XALL 4 500 | grep -v 0.00", to watch for the non zero interrupt
>> rates after figuring out which interrupts belong to your network adapter.
>>
>> Thanks,
>>
>> Alex
>

------------------------------------------------------------------------------
Comprehensive Server Monitoring with Site24x7.
Monitor 10 servers for $9/Month.
Get alerted through email, SMS, voice calls or mobile push notifications.
Take corrective actions from your mobile device.
http://pubads.g.doubleclick.net/gampad/clk?id=154624111&iu=/4140/ostg.clktrk

_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit 
http://communities.intel.com/community/wired

Re: [E1000-devel] I350 high interrupts with a lot of traffic on 3.10 or 3.14 kernel

Reply via email to