(Top-posting)
Nishit,
Thank you for getting some numbers for comparison. As I told you before, 82574
is simply a more efficient part in this regard. To see if we can help you tune
your performance, there are a few more things we can try. Firstly, I'd suggest
comparing 82574 to 82572 (I believe you said you have one-please disregard this
if I'm mistaken). While 82572 and 82574 still have different interrupt
mechanisms, they are both x1 parts and will be a fairer comparison.
Also, can you run perf and collect the data? You can use
# perf record -a -- <command>
to record the activity in the kernel while executing a command. In your case,
you could feed it netperf. Once this has completed, you can run
# perf report
to generate a human-readable report. Please gather the perf report for 82572
(or 82571, if necessary) and 82574 and we'll go from there.
Cheers,
Matthew
From: Nishit Shah [mailto:[email protected]]
Sent: Saturday, July 28, 2012 12:48 AM
To: Vick, Matthew
Cc: [email protected]
Subject: Re: [E1000-devel] problem with simplified balancing on 82574 chips
Hi,
Below are the results of the CPU utilization for 82571 and 82574(MSI-X)
chips.
Test Setup
netperf client --> Server with 82571 and 82574 chip.
Current PPS: 4,50,000
Server config
Kernel: Vanilla 2.6.39.4
CPU: Affinity is binded to a single core
of Intel Quad core Q9400 2.66 Ghz processor.
e1000e Driver: 1.9.5
(netperf -t UDP_STREAM -N -l 3600 -H <server ip> -- -m 64)
1.) 82571 and 82574 with InterruptThrottleRate=3
interrupts idle cpu
82571 21100 52-54
82574 21100 62-64
2.) 82571 and 82574 with InterruptThrottleRate=3000
interrupts idle cpu
82571 4050 62-64 (10% gain compare to
InterruptThrottleRate=3)
82574 4050 63-65 (1% gain compare to
InterruptThrottleRate=3)
3.) 82571 and 82574 with InterruptThrottleRate=8000
interrupts idle cpu
82571 9050 60-62 (8% gain compare to
InterruptThrottleRate=3)
82574 9050 63-65 (1% gain compare to
InterruptThrottleRate=3)
I see less number of interrupts with MSI-X interrupts but CPU utilization
is almost same.
Is it possible to reduce CPU utilization further by tunning/changing any
other parameters ?
Rgds,
Nishit Shah.
On 6/26/2012 2:32 PM, Nishit Shah wrote:
Hi Mathew,
It is working fine for both the reported issues.
Thanks once again for all your help.
I have prepared a box with 2 82571 ports and 2 82574 ports.
Will let you know the CPU utilization results of simplified mode for both
the chips.
Rgds,
Nishit Shah.
On 6/23/2012 5:53 AM, Vick, Matthew wrote:
(Top-posting)
I sent you a tar ball of a patched 1.9.5 e1000e that should resolve both of the
issues you're seeing. Please let me know if you have any problems with it.
Cheers,
Matthew
From: Nishit Shah [mailto:[email protected]]
Sent: Tuesday, June 19, 2012 10:33 PM
To: Vick, Matthew
Cc: [email protected]<mailto:[email protected]>
Subject: Re: [E1000-devel] problem with simplified balancing on 82574 chips
Thanks Mathew,
It really helps a lot in understanding the workings.
Rgds,
Nishit Shah.
On 6/20/2012 6:27 AM, Vick, Matthew wrote:
(Top-posting)
Thanks for the additional data. 82574, being a newer part, has a more efficient
interrupt mechanism for the driver to use than previous parts. The restriction
on interrupts with InterruptThrottleRate obviously helps CPU utilization, but
it isn't going to be as dramatic of a change when compared to other parts the
interrupts are already more efficient.
I'm still working to create a finalized patch that resolves both issues you've
raised.
Cheers,
Matthew
From: Nishit Shah [mailto:[email protected]]
Sent: Tuesday, June 19, 2012 5:47 AM
To: Vick, Matthew
Cc: [email protected]<mailto:[email protected]>
Subject: Re: [E1000-devel] problem with simplified balancing on 82574 chips
Hi Matthew,
1.) vmstat and top output with ethtool -C <nic> rx-usecs 0 as well as
ethtool -C <nic> rx-usecs 3. (In my case both are giving me the same results)
# vmstat 1
procs -----------memory---------- ---swap-- -----io---- --system--
-----cpu------
r b swpd free buff cache si so bi bo in
cs us sy id wa st
0 0 0 2013040 11748 5736 0 0 0 0 40644 18 0 23
77 0 0
0 0 0 2013040 11748 5736 0 0 0 0 40736 28 0 23
77 0 0
0 0 0 2013040 11748 5736 0 0 0 0 40730 10 0 24
76 0 0
0 0 0 2013040 11748 5736 0 0 0 0 40731 14 0 24
76 0 0
0 0 0 2013040 11748 5736 0 0 0 0 40735 24 0 24
76 0 0
0 0 0 2013040 11748 5736 0 0 0 0 40732 12 0 24
76 0 0
# top output
Tasks: 41 total, 1 running, 40 sleeping, 0 stopped, 0 zombie
Cpu0 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu1 : 0.0%us, 0.0%sy, 0.0%ni, 52.8%id, 0.0%wa, 1.3%hi, 45.8%si, 0.0%st
2.) vmstat and top output with ethtool -C <nic> rx-usecs 4
# vmstat 1
procs -----------memory---------- ---swap-- -----io---- --system--
-----cpu------
r b swpd free buff cache si so bi bo in cs
us sy id wa st
0 0 0 2012916 11804 5740 0 0 0 0 18599 20 0 23
77 0 0
0 0 0 2012916 11804 5740 0 0 0 0 18588 14 0 22
78 0 0
0 0 0 2012916 11804 5740 0 0 0 0 18592 12 0 22
78 0 0
0 0 0 2012916 11804 5740 0 0 0 0 18591 12 0 22
78 0 0
0 0 0 2012916 11804 5740 0 0 0 0 18593 11 0 22
78 0 0
0 0 0 2012916 11804 5740 0 0 0 0 18594 12 0 23
77 0 0
# top output
Tasks: 41 total, 1 running, 40 sleeping, 0 stopped, 0 zombie
Cpu0 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu1 : 0.0%us, 0.0%sy, 0.0%ni, 53.3%id, 0.0%wa, 0.3%hi, 46.3%si, 0.0%st
I can see a good amount of drop in interrupts in vmstat. i.e. 40600 to 18600
but I don't see much improvement in top output in terms of CPU utilization.
Rgds,
Nishit Shah.
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
E1000-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel® Ethernet, visit
http://communities.intel.com/community/wired