Alexander
Thanks for the reply and clear up.. Looks like I’m doing 4-5x the number of
interrupts
Like for example 130 is eth2-TxRx-0 an 112 is eth0-TxRx-0
There are 2 bonds on this host. One is to external network and the other is for
the internal network with a total of 4 Nics.
3.10 kernel
05:58:42 AM INTR intr/s
05:58:46 AM 104 0.25
05:58:46 AM 105 0.25
05:58:46 AM 106 0.25
05:58:46 AM 107 0.25
05:58:46 AM 108 0.25
05:58:46 AM 112 4866.25
05:58:46 AM 113 5007.50
05:58:46 AM 114 4891.75
05:58:46 AM 115 5171.00
05:58:46 AM 116 4894.00
05:58:46 AM 118 5253.75
05:58:46 AM 119 4986.00
05:58:46 AM 121 3.50
05:58:46 AM 122 6.00
05:58:46 AM 123 3.75
05:58:46 AM 124 1.25
05:58:46 AM 125 2.25
05:58:46 AM 126 2.00
05:58:46 AM 127 1.00
05:58:46 AM 128 1.25
05:58:46 AM 130 8547.25
05:58:46 AM 131 8671.50
05:58:46 AM 132 8620.50
05:58:46 AM 133 8864.00
05:58:46 AM 134 8508.25
05:58:46 AM 135 8597.25
05:58:46 AM 136 8742.75
05:58:46 AM 137 8536.25
05:58:46 AM 139 6.00
05:58:46 AM 140 6.25
05:58:46 AM 141 6.50
05:58:46 AM 142 1.75
05:58:46 AM 143 2.75
05:58:46 AM 144 1.50
05:58:46 AM 145 2.00
05:58:46 AM 146 6.25
2.6 kernel
05:58:38 AM INTR intr/s
05:58:42 AM 50 203.27
05:58:42 AM 82 2505.54
05:58:42 AM 83 2731.99
05:58:42 AM 84 2586.65
05:58:42 AM 85 2565.99
05:58:42 AM 86 2078.34
05:58:42 AM 87 2351.89
05:58:42 AM 88 2270.03
05:58:42 AM 89 2579.09
05:58:42 AM 91 94.71
05:58:42 AM 92 31.49
05:58:42 AM 93 37.28
05:58:42 AM 94 42.32
05:58:42 AM 95 32.24
05:58:42 AM 96 30.73
05:58:42 AM 97 39.04
05:58:42 AM 98 48.61
05:58:42 AM 100 2949.87
05:58:42 AM 101 3349.12
05:58:42 AM 102 3233.00
05:58:42 AM 103 2839.55
05:58:42 AM 105 2912.09
05:58:42 AM 106 2672.29
05:58:42 AM 107 2996.98
05:58:42 AM 109 91.69
05:58:42 AM 110 48.11
05:58:42 AM 111 42.32
05:58:42 AM 112 46.60
05:58:42 AM 113 46.35
05:58:42 AM 114 53.15
05:58:42 AM 115 52.90
05:58:42 AM 116 43.83
--
Mike Zupan
On Friday, November 14, 2014 at 8:54 PM, Alexander Duyck wrote:
> On 11/13/2014 11:13 AM, Mike Zupan wrote:
> > I’m having a strange issue doing on with 3.10 or 3.17 kernel that I’m not
> > seeing with 2.6. We are seeing a lot of softirq requests for network cards
> > even on a mostly idle system. It happens on any server in the cluster if I
> > deploy the 3.10 or 3.17 kernel
> >
> > Using top we noticed this process using a lot of CPU. As soon as I give the
> > server traffic load spikes to well over 200 for a 1 min average.
> >
> > [kworker/u66:2]
> >
> > That lead us to install `powertop` and then saw this
> >
> > Usage Events/s Category Description
> > 1110 ms/s 2045.2 Process php-fpm: pool www
> > 36.0 ms/s 2165.4 Timer tick_sched_timer
> > 57.7 ms/s 1285.0 Process nginx: worker process
> > 13.3 ms/s 416.0 Timer hrtimer_wakeup
> > 39.1 ms/s 350.7 Interrupt [3] net_rx(softirq)
> >
> > This is the same on a 2.6 series getting the same amount of traffic
> >
> > Usage Events/s Category Description
> > 1795 ms/s 1654.0 Process php-fpm: pool www
> > 45.3 ms/s 1110.4 Process nginx: worker process
> > 562.8 µs/s 122.4 Process /usr/bin/java -Xms200m -Xmx2000m -Xss256k
> > -XX:MaxDirectMemorySize=516m -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -Dage
> > 497.1 µs/s 59.3 Process /usr/sbin/gmond
> > 16.0 ms/s 30.2 Process /usr/bin/redis-server 127.0.0.1:6379
> > 4.7 ms/s 32.8 Process python /usr/bin/statsd-relay.py
> > 81.7 ms/s 0.00 Timer tcp_delack_timer
> > 24.8 ms/s 0.00 Timer tick_sched_timer
> > 549.4 µs/s 9.2 Process java -Xmx6g -server -Dfile.encoding=utf-8
> > -XX:OnOutOfMemoryError=kill -9 %p -XX:+HeapDumpOnOutOfMemoryError -XX:HeapD
> > 15.2 ms/s 0.00 Interrupt [3] net_rx(softirq)
> >
> >
> > As you can see the net_rx is 0 on 2.6 but we get as many as 4k/s on 3.10.
> > The server specs are the same and removed all sysctl settings. I can
> > replicate the issue just by installing 3.10 on a server.
> >
> > the nics we have in are
> >
> > 06:00.0 Ethernet controller: Intel Corporation I350 Gigabit Network
> > Connection (rev 01)
> > 06:00.1 Ethernet controller: Intel Corporation I350 Gigabit Network
> > Connection (rev 01)
> > 06:00.2 Ethernet controller: Intel Corporation I350 Gigabit Network
> > Connection (rev 01)
> > 06:00.3 Ethernet controller: Intel Corporation I350 Gigabit Network
> > Connection (rev 01)
> >
> > --
> > Mike Zupan
> >
>
>
> Mike,
>
> I would recommend installing the "perf" tool and running "perf top"
> instead of "powertop" to try and determine what is running on your
> system. The powertop tool is meant to determine what is waking you up
> out of sleep states, not what is actually making use of the system. As
> such with powertop you could see 0 events per second and all that would
> mean is that the system isn't getting to sleep as it is too busy, which
> a high count could actually mean your system is going idle resulting in
> a significant number of wake-ups.
>
> For interrupt information you might try watching the rate at which
> /proc/interrupts increases or you could install sysstat and then run
> "sar -I XALL 4 500 | grep -v 0.00", to watch for the non zero interrupt
> rates after figuring out which interrupts belong to your network adapter.
>
> Thanks,
>
> Alex
------------------------------------------------------------------------------
Comprehensive Server Monitoring with Site24x7.
Monitor 10 servers for $9/Month.
Get alerted through email, SMS, voice calls or mobile push notifications.
Take corrective actions from your mobile device.
http://pubads.g.doubleclick.net/gampad/clk?id=154624111&iu=/4140/ostg.clktrk
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel® Ethernet, visit
http://communities.intel.com/community/wired