Hi Luca,
What sort of results do you get running numademo on your usual test box, the
Supermicro server?
It would be good to compare...
Andrew
From: Luca Deri [mailto:d...@ntop.org]
Sent: Thursday, September 15, 2011 4:35 PM
To: LEHANE,ANDREW (A-Scotland,ex1)
Cc: donald.c.skidm...@intel.com; e1000-devel@lists.sourceforge.net
Subject: Re: Problems with Dell R810.
Andrew
on the other 710 box the other user (with problems) numbers are
[khobdeh@wrshrk-do4 Scripts]$ numademo 128M memcpy
2 nodes available
memory with no policy memcpy Avg 5762.69 MB/s Max 5769.58 MB/s Min
5758.19 MB/s
local memory memcpy Avg 5564.86 MB/s Max 5571.51 MB/s Min
5558.13 MB/s
memory interleaved on all nodes memcpy Avg 5542.73 MB/s Max 5551.23 MB/s Min
5537.03 MB/s
memory on node 0 memcpy Avg 11055.01 MB/s Max 11065.85 MB/s
Min 11044.91 MB/s
memory on node 1 memcpy Avg 5416.99 MB/s Max 5421.19 MB/s Min
5411.35 MB/s
memory interleaved on 0 1 memcpy Avg 6657.30 MB/s Max 6666.55 MB/s Min
6648.72 MB/s
setting preferred node to 0
memory without policy memcpy Avg 6712.97 MB/s Max 6722.31 MB/s Min
6702.84 MB/s
setting preferred node to 1
memory without policy memcpy Avg 5619.31 MB/s Max 5625.93 MB/s Min
5614.63 MB/s
manual interleaving to all nodes memcpy Avg 6615.03 MB/s Max 6641.81 MB/s Min
6473.63 MB/s
manual interleaving on node 0/1 memcpy Avg 6633.18 MB/s Max 6636.56 MB/s Min
6628.36 MB/s
current interleave node 0
running on node 0, preferred node 0
local memory memcpy Avg 6654.09 MB/s Max 6660.27 MB/s Min
6649.05 MB/s
memory interleaved on all nodes memcpy Avg 6657.69 MB/s Max 6664.57 MB/s Min
6651.69 MB/s
memory interleaved on node 0/1 memcpy Avg 6633.05 MB/s Max 6639.51 MB/s Min
6624.44 MB/s
alloc on node 1 memcpy Avg 5420.20 MB/s Max 5423.82 MB/s Min
5417.03 MB/s
local allocation memcpy Avg 6641.88 MB/s Max 6649.71 MB/s Min
6635.24 MB/s
setting wrong preferred node memcpy Avg 5623.92 MB/s Max 5627.58 MB/s Min
5620.27 MB/s
setting correct preferred node memcpy Avg 6653.27 MB/s Max 6663.24 MB/s Min
6642.80 MB/s
running on node 1, preferred node 0
local memory memcpy Avg 7789.23 MB/s Max 7801.09 MB/s Min
7769.48 MB/s
memory interleaved on all nodes memcpy Avg 6357.53 MB/s Max 6361.03 MB/s Min
6352.60 MB/s
memory interleaved on node 0/1 memcpy Avg 6430.76 MB/s Max 6434.83 MB/s Min
6427.74 MB/s
alloc on node 0 memcpy Avg 5422.83 MB/s Max 5427.54 MB/s Min
5417.90 MB/s
local allocation memcpy Avg 7782.09 MB/s Max 7788.87 MB/s Min
7771.28 MB/s
setting wrong preferred node memcpy Avg 5436.43 MB/s Max 5441.85 MB/s Min
5432.16 MB/s
setting correct preferred node memcpy Avg 7779.38 MB/s Max 7788.87 MB/s Min
7763.64 MB/s
So my conclusion are
- that these two 710 Dell's are very different.
- on boxes with low MB/s I see issues, we don't have on faster boxes.
Donald: what do you think?
Luca
On Sep 15, 2011, at 5:30 PM,
<andrew_leh...@agilent.com<mailto:andrew_leh...@agilent.com>> wrote:
Hi Donald, Luca
OK, could this be it???
R810 first
# numademo 128M memcpy
4 nodes available
memory with no policy memcpy Avg 6103.61 MB/s Max 6114.70 MB/s Min
6020.62 MB/s
local memory memcpy Avg 6112.45 MB/s Max 6113.59 MB/s Min
6109.69 MB/s
memory interleaved on all nodes memcpy Avg 4596.04 MB/s Max 4598.86 MB/s Min
4592.72 MB/s
memory on node 0 memcpy Avg 4298.76 MB/s Max 4299.65 MB/s Min
4297.44 MB/s
memory on node 1 memcpy Avg 4311.93 MB/s Max 4318.46 MB/s Min
4263.59 MB/s
memory on node 2 memcpy Avg 4224.10 MB/s Max 4230.39 MB/s Min
4174.73 MB/s
memory on node 3 memcpy Avg 6103.11 MB/s Max 6115.82 MB/s Min
6009.03 MB/s
memory interleaved on 0 1 memcpy Avg 4272.03 MB/s Max 4274.59 MB/s Min
4270.51 MB/s
memory interleaved on 0 2 memcpy Avg 4229.02 MB/s Max 4232.00 MB/s Min
4227.33 MB/s
memory interleaved on 1 2 memcpy Avg 4238.95 MB/s Max 4241.36 MB/s Min
4235.47 MB/s
memory interleaved on 0 1 2 memcpy Avg 4254.17 MB/s Max 4255.88 MB/s Min
4251.84 MB/s
memory interleaved on 0 3 memcpy Avg 5007.41 MB/s Max 5008.31 MB/s Min
5006.44 MB/s
memory interleaved on 1 3 memcpy Avg 5015.63 MB/s Max 5017.49 MB/s Min
5014.11 MB/s
memory interleaved on 0 1 3 memcpy Avg 4737.35 MB/s Max 4746.87 MB/s Min
4677.06 MB/s
memory interleaved on 2 3 memcpy Avg 4966.48 MB/s Max 4967.53 MB/s Min
4965.69 MB/s
memory interleaved on 0 2 3 memcpy Avg 4693.85 MB/s Max 4710.88 MB/s Min
4636.51 MB/s
memory interleaved on 1 2 3 memcpy Avg 4716.13 MB/s Max 4718.17 MB/s Min
4714.19 MB/s
memory interleaved on 0 1 2 3 memcpy Avg 4583.39 MB/s Max 4597.28 MB/s Min
4530.56 MB/s
setting preferred node to 0
memory without policy memcpy Avg 4294.01 MB/s Max 4300.47 MB/s Min
4243.50 MB/s
setting preferred node to 1
memory without policy memcpy Avg 4318.67 MB/s Max 4319.43 MB/s Min
4315.13 MB/s
setting preferred node to 2
memory without policy memcpy Avg 4225.77 MB/s Max 4231.19 MB/s Min
4180.33 MB/s
setting preferred node to 3
memory without policy memcpy Avg 6103.52 MB/s Max 6115.82 MB/s Min
6007.69 MB/s
manual interleaving to all nodes memcpy Avg 4590.92 MB/s Max 4599.65 MB/s Min
4531.93 MB/s
manual interleaving on node 0/1 memcpy Avg 4267.89 MB/s Max 4275.27 MB/s Min
4216.57 MB/s
current interleave node 0
running on node 0, preferred node 0
local memory memcpy Avg 6055.75 MB/s Max 6058.12 MB/s Min
6052.39 MB/s
memory interleaved on all nodes memcpy Avg 4620.07 MB/s Max 4634.59 MB/s Min
4576.44 MB/s
memory interleaved on node 0/1 memcpy Avg 4991.07 MB/s Max 5009.43 MB/s Min
4869.84 MB/s
alloc on node 1 memcpy Avg 4328.33 MB/s Max 4329.74 MB/s Min
4319.57 MB/s
alloc on node 2 memcpy Avg 4338.42 MB/s Max 4356.30 MB/s Min
4269.15 MB/s
alloc on node 3 memcpy Avg 4318.31 MB/s Max 4326.39 MB/s Min
4302.68 MB/s
local allocation memcpy Avg 6057.60 MB/s Max 6058.94 MB/s Min
6055.12 MB/s
setting wrong preferred node memcpy Avg 4326.59 MB/s Max 4329.60 MB/s Min
4317.35 MB/s
setting correct preferred node memcpy Avg 6045.33 MB/s Max 6058.67 MB/s Min
5944.89 MB/s
running on node 1, preferred node 0
local memory memcpy Avg 6069.85 MB/s Max 6070.73 MB/s Min
6068.53 MB/s
memory interleaved on all nodes memcpy Avg 4621.28 MB/s Max 4624.05 MB/s Min
4618.80 MB/s
memory interleaved on node 0/1 memcpy Avg 5005.25 MB/s Max 5006.82 MB/s Min
4996.56 MB/s
alloc on node 0 memcpy Avg 4314.46 MB/s Max 4321.52 MB/s Min
4256.83 MB/s
alloc on node 2 memcpy Avg 4291.12 MB/s Max 4291.81 MB/s Min
4290.44 MB/s
alloc on node 3 memcpy Avg 4336.58 MB/s Max 4342.77 MB/s Min
4288.24 MB/s
local allocation memcpy Avg 6070.04 MB/s Max 6072.10 MB/s Min
6068.53 MB/s
setting wrong preferred node memcpy Avg 4285.92 MB/s Max 4291.40 MB/s Min
4239.75 MB/s
setting correct preferred node memcpy Avg 6069.96 MB/s Max 6071.00 MB/s Min
6068.81 MB/s
running on node 2, preferred node 0
local memory memcpy Avg 6109.00 MB/s Max 6122.51 MB/s Min
5995.61 MB/s
memory interleaved on all nodes memcpy Avg 4588.44 MB/s Max 4592.41 MB/s Min
4585.66 MB/s
memory interleaved on node 0/1 memcpy Avg 4257.87 MB/s Max 4258.99 MB/s Min
4255.07 MB/s
alloc on node 0 memcpy Avg 4314.82 MB/s Max 4321.66 MB/s Min
4259.66 MB/s
alloc on node 1 memcpy Avg 4262.60 MB/s Max 4269.01 MB/s Min
4216.84 MB/s
alloc on node 3 memcpy Avg 4228.30 MB/s Max 4229.86 MB/s Min
4224.93 MB/s
local allocation memcpy Avg 6109.92 MB/s Max 6122.79 MB/s Min
6011.72 MB/s
setting wrong preferred node memcpy Avg 4223.70 MB/s Max 4229.73 MB/s Min
4175.90 MB/s
setting correct preferred node memcpy Avg 6109.78 MB/s Max 6122.51 MB/s Min
6002.58 MB/s
running on node 3, preferred node 0
local memory memcpy Avg 6113.31 MB/s Max 6114.15 MB/s Min
6111.36 MB/s
memory interleaved on all nodes memcpy Avg 4595.63 MB/s Max 4597.44 MB/s Min
4594.45 MB/s
memory interleaved on node 0/1 memcpy Avg 4269.70 MB/s Max 4271.32 MB/s Min
4262.50 MB/s
alloc on node 0 memcpy Avg 4292.58 MB/s Max 4299.23 MB/s Min
4236.14 MB/s
alloc on node 1 memcpy Avg 4312.16 MB/s Max 4318.60 MB/s Min
4263.45 MB/s
alloc on node 2 memcpy Avg 4224.20 MB/s Max 4230.26 MB/s Min
4178.37 MB/s
local allocation memcpy Avg 6113.42 MB/s Max 6114.42 MB/s Min
6112.20 MB/s
setting wrong preferred node memcpy Avg 4292.88 MB/s Max 4300.75 MB/s Min
4228.53 MB/s
setting correct preferred node memcpy Avg 6102.86 MB/s Max 6116.10 MB/s Min
5999.90 MB/s
Now R710
# numademo 128M memcpy
2 nodes available
memory with no policy memcpy Avg 16900.16 MB/s Max 17843.36 MB/s
Min 13960.65 MB/s
local memory memcpy Avg 17831.27 MB/s Max 17840.98 MB/s
Min 17772.47 MB/s
memory interleaved on all nodes memcpy Avg 13256.20 MB/s Max 13335.09 MB/s
Min 12613.26 MB/s
memory on node 0 memcpy Avg 17838.38 MB/s Max 17843.36 MB/s
Min 17831.50 MB/s
memory on node 1 memcpy Avg 10849.47 MB/s Max 10855.53 MB/s
Min 10843.25 MB/s
memory interleaved on 0 1 memcpy Avg 13330.99 MB/s Max 13333.77 MB/s
Min 13324.50 MB/s
setting preferred node to 0
memory without policy memcpy Avg 17717.58 MB/s Max 17840.98 MB/s
Min 16712.46 MB/s
setting preferred node to 1
memory without policy memcpy Avg 10852.45 MB/s Max 10856.40 MB/s
Min 10846.75 MB/s
manual interleaving to all nodes memcpy Avg 13331.78 MB/s Max 13333.77 MB/s
Min 13329.80 MB/s
manual interleaving on node 0/1 memcpy Avg 13306.01 MB/s Max 13333.77 MB/s
Min 13082.93 MB/s
current interleave node 0
running on node 0, preferred node 0
local memory memcpy Avg 17603.71 MB/s Max 17840.98 MB/s
Min 16708.29 MB/s
memory interleaved on all nodes memcpy Avg 13327.68 MB/s Max 13333.77 MB/s
Min 13295.47 MB/s
memory interleaved on node 0/1 memcpy Avg 13331.92 MB/s Max 13333.77 MB/s
Min 13329.80 MB/s
alloc on node 1 memcpy Avg 10734.41 MB/s Max 10855.53 MB/s
Min 10188.85 MB/s
local allocation memcpy Avg 17838.14 MB/s Max 17840.98 MB/s
Min 17836.24 MB/s
setting wrong preferred node memcpy Avg 10467.28 MB/s Max 10855.53 MB/s
Min 7928.27 MB/s
setting correct preferred node memcpy Avg 17836.95 MB/s Max 17840.98 MB/s
Min 17831.50 MB/s
running on node 1, preferred node 0
local memory memcpy Avg 17358.28 MB/s Max 17843.36 MB/s
Min 13969.37 MB/s
memory interleaved on all nodes memcpy Avg 13332.18 MB/s Max 13335.09 MB/s
Min 13313.93 MB/s
memory interleaved on node 0/1 memcpy Avg 13334.56 MB/s Max 13336.42 MB/s
Min 13332.45 MB/s
alloc on node 0 memcpy Avg 10852.10 MB/s Max 10854.65 MB/s
Min 10851.14 MB/s
local allocation memcpy Avg 17837.43 MB/s Max 17843.36 MB/s
Min 17833.87 MB/s
setting wrong preferred node memcpy Avg 10853.24 MB/s Max 10855.53 MB/s
Min 10850.26 MB/s
setting correct preferred node memcpy Avg 17839.09 MB/s Max 17840.98 MB/s
Min 17833.87 MB/s
That's quite a difference!
Luca, could you run the same on your test box, so we can compare?
Andrew
-----Original Message-----
From: Skidmore, Donald C [mailto:donald.c.skidm...@intel.com]
Sent: Thursday, September 15, 2011 4:19 PM
To: Luca Deri; LEHANE,ANDREW (A-Scotland,ex1)
Cc: e1000-devel@lists.sourceforge.net<mailto:e1000-devel@lists.sourceforge.net>
Subject: RE: Problems with Dell R810.
Hey Luca,
Sounds like your memory may be a fair amount lot slower on the larger system.
This isn't unusual as these systems also support higher memory limits. One
quick way to test would be to run numademo -
numademo 128M memcpy
to see the diff's between the two systems.
Thanks,
-Don Skidmore <donald.c.skidm...@intel.com<mailto:donald.c.skidm...@intel.com>>
-----Original Message-----
From: Luca Deri [mailto:d...@ntop.org]
Sent: Thursday, September 15, 2011 7:52 AM
To: andrew_leh...@agilent.com<mailto:andrew_leh...@agilent.com>
Cc: Skidmore, Donald C;
e1000-devel@lists.sourceforge.net<mailto:e1000-devel@lists.sourceforge.net>
Subject: Re: Problems with Dell R810.
Andrew
just to be precise (I don't want to tease you of course), on a X3440 we
can send 14.88 Mpps (~ 26 Mpps on two ports) so we're quite close now.
As of the 710 problem I have reported, I will ask the 710 user who has
reported the issue.
Now the question is: where all these issues are coming from? Why a 810
(more powerful than a 710) reports a much poor performance? Do you have
the chance to read the BIOS revision of your 710 so I can compare it
with the one of the other use who as issues?
This said: great news.
Cheers Luca
On Sep 15, 2011, at 4:45 PM,
<andrew_leh...@agilent.com<mailto:andrew_leh...@agilent.com>> wrote:
Hi Donald and Luca,
I have managed to obtain the loan a R710 and using the Silicom card
and Luca's code I can send in excess of 14Million packets per sec, so
whatever the problem with the R710 Luca has reported it is not the same
as my issues with the R810! Of course, unless my R810 has suffered the
same fault as the R710 listed below and both are now broken in the same
way. Does a reboot clear your other user's problem Luca or is it
permanent?
Luca here's the details...
./pfsend -i dna:eth4 -g 1 -l 60 -n 0 -r 10
TX rate: [current 14'238'148.23 pps/9.57 Gbps][average 14'223'555.75
pps/9.56 Gbps][total 2'147'799'248.00 pkts]
TX rate: [current 14'240'502.43 pps/9.57 Gbps][average 14'223'667.24
pps/9.56 Gbps][total 2'162'040'021.00 pkts]
TX rate: [current 14'239'155.21 pps/9.57 Gbps][average 14'223'768.47
pps/9.56 Gbps][total 2'176'279'461.00 pkts]
TX rate: [current 14'238'531.22 pps/9.57 Gbps][average 14'223'864.33
pps/9.56 Gbps][total 2'190'518'277.00 pkts]
Thanks
Andrew
-----Original Message-----
From: Luca Deri [mailto:d...@ntop.org]
Sent: Thursday, September 15, 2011 3:05 PM
To: Skidmore, Donald C
Cc: LEHANE,ANDREW (A-Scotland,ex1);
e1000-devel@lists.sourceforge.net<mailto:e1000-devel@lists.sourceforge.net>
Subject: Re: Problems with Dell R810.
Donald
I have been reported by another PF_RING user of the following problem
(Dell 710 and Intel 82576):
Wed Sep 14 2011 06:00:11 An OEM diagnostic event has occurred.
Critical 0.000009Wed Sep 14 2011 06:00:11 A bus fatal error was
detected on a component at bus 0 device 6 function 0.
Critical 0.000008Wed Sep 14 2011 06:00:11 A bus fatal error was
detected on a component at slot 1.
Normal 0.000007Wed Sep 14 2011 06:00:11 An OEM diagnostic event has
occurred.
Critical 0.000006Wed Sep 14 2011 06:00:11 A bus fatal error was
detected on a component at bus 0 device 5 function 0.
Critical 0.000005Wed Sep 14 2011 06:00:10 A bus fatal error was
detected on a component at slot 2.
Normal 0.000004Wed Sep 14 2011 06:00:08 An OEM diagnostic event has
occurred.
Critical 0.000003Wed Sep 14 2011 06:00:08 A bus fatal error was
detected on a component at bus 0 device 6 function 0.
Critical 0.000002Wed Sep 14 2011 06:00:08 A bus fatal error was
detected on a component at slot 1.
Normal 0.000001Wed Sep 14 2011 06:00:08 An OEM diagnostic event has
occurred.
Additionally, we captured the following logs as well:
alloc kstat_irqs on node -1
pcieport 0000:00:09.0: irq 62 for MSI/MSI-X pcieport 0000:00:09.0:
setting latency timer to 64 aer 0000:00:01.0:pcie02: PCIe errors
handled by platform firmware.
aer 0000:00:03.0:pcie02: PCIe errors handled by platform firmware.
aer 0000:00:04.0:pcie02: PCIe errors handled by platform firmware.
aer 0000:00:05.0:pcie02: PCIe errors handled by platform firmware.
aer 0000:00:06.0:pcie02: PCIe errors handled by platform firmware.
aer 0000:00:07.0:pcie02: PCIe errors handled by platform firmware.
aer 0000:00:09.0:pcie02: PCIe errors handled by platform firmware.
I believe there's a BIOS issue on Dell's. What do you think?
Regards Luca
On Sep 4, 2011, at 1:25 PM, Luca Deri wrote:
Donald
thanks for the reply. I don't think this is a PF_RING issue (even
using the vanilla ixgbe driver we observe the same behavior) but rather
a Dell/Intel issue. From what I see on dmesg, it seems that DCA is
disabled and we have no way to enable it. I'm not sure if this is due
to BIOS limitations. What I can tell you is that a low-end core2duo is
much faster than this multiprocessor machine, and this is an indication
that there's something wrong on this setup.
Regards Luca
On Sep 3, 2011, at 2:33 AM, Skidmore, Donald C wrote:
-----Original Message-----
From: andrew_leh...@agilent.com<mailto:andrew_leh...@agilent.com>
[mailto:andrew_leh...@agilent.com]
Sent: Thursday, September 01, 2011 2:17 AM
To: e1000-devel@lists.sourceforge.net<mailto:e1000-devel@lists.sourceforge.net>
Cc: d...@ntop.org<mailto:d...@ntop.org>
Subject: [E1000-devel] Problems with Dell R810.
Hi,
I recently purchased as Dell R810 for use with Luca Deri's PF_RING
networking driver for the 10 Gigabit PCI Express Network Driver
and the Silicom 10Gig card that uses the 82599EB chipset, machine
is running Fedora Core 14.
Luca's driver is described here:
http://www.ntop.org/blog/pf_ring/introducing-the-10-gbit-pf_ring-
dna
-
driver/
Only the machine doesn't seem to want to play ball. We have tried
a number of things and so eventually Luca suggested this mailing
list,
I do hope someone can help?
The machine spec is as follows.
2x Intel Xeon L7555 Processor (1.86GHz, 8C, 24M Cache, 5.86 GT/s
QPI, 95W TDP, Turbo, HT), DDR3-980MHz 128GB Memory for 2/4CPU
(16x8GB Quad Rank LV RDIMMs) 1066MHz Additional 2x Intel Xeon
L7555 Processor (1.86GHz, 8C, 24M Cache, 5.86 GT/s QPI, 95W TDP,
Turbo, HT), Upgrade to 4CPU
2 600GB SAS 6Gbps 10k 2.5" HD
Silicom 82599EB 10 Gigabit Ethernet NIC.
According to Luca's experiments on his test machine, not an R810
(actually quite a low spec machine by comparison) we should be
getting the following results, unfortunately, the R810 performance
is very poor; it struggles at less than 8% capacity of a 10 Gig
link
on one core; Luca's test application (byte and packet counts only)
and his machine can process a 100% of a 10 Gig Link on one core.
http://www.ntop.org/blog/pf_ring/how-to-sendreceive-26mpps-using-
pf_ring-on-commodity-hardware/
Importantly, Luca also seems to be getting excellent CPU usage
figures, see the bottom of the page, indicating that both DCA and
IOATDMA are operating correctly. My problem is that even on light
network loads my CPU hits 100% and packets are dropped,
indicating, to me, that DCA/IOATDMA isn't working.
I have switched on IOATDMA in the Dell's BIOS, it's off by
default, and discovered the following site where it talks about
configuring
a
machine to use DCA and IOATDMA etc. I even found a chap who
reported
similar performance problems but with a Dell R710 and how he fixed
it. I tried all this but still no improvement!
http://www.mail-archive.com/ntop-
m...@listgateway.unipi.it<mailto:m...@listgateway.unipi.it>/msg01185.
html
The R810 seems to use a 7500 chipset.
http://www.dell.com/downloads/global/products/pedge/pedge_r810_specs
heet
_en.pdf
So, I think this is the R810 chipset reference http://www-
techdoc.intel.com/content/dam/doc/datasheet/7500-chipset-<http://techdoc.intel.com/content/dam/doc/datasheet/7500-chipset->
datasheet.p
df,
see page 453
The program sets bits (0x8C @ bit 0) but it doesn't seem to stay
set, so consecutive calls to "dca_probe" seem to always say "DCA
disabled, enabling now."
I commented out some of the defines in the original code as they
are
already set in the Linux kernel and, of course, changed the
registers to point to the ones on page 453 - I hope they are
correct.
Still no luck the CPU usage is way too high.
#define _XOPEN_SOURCE 500
#include <stdio.h>
#include <stdlib.h>
#include <pci/pci.h>
#include <sys/io.h>
#include <fcntl.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <unistd.h>
#define INTEL_BRIDGE_DCAEN_OFFSET 0x8c
#define INTEL_BRIDGE_DCAEN_BIT 0
/*#define PCI_HEADER_TYPE_BRIDGE 1 */
/*#define PCI_VENDOR_ID_INTEL 0x8086 *//* lol @ intel
*/
/*#define PCI_HEADER_TYPE 0x0e */
#define MSR_P6_DCA_CAP 0x000001f8
#define NUM_CPUS 64
void check_dca(struct pci_dev *dev) {
u32 dca = pci_read_long(dev, INTEL_BRIDGE_DCAEN_OFFSET);
printf("DCA old value %d.\n", dca); if (!(dca & (1 <<
INTEL_BRIDGE_DCAEN_BIT))) {
printf("DCA disabled, enabling now.\n");
dca |= 1 << INTEL_BRIDGE_DCAEN_BIT;
printf("DCA new value %d.\n", dca);
pci_write_long(dev, INTEL_BRIDGE_DCAEN_OFFSET, dca); }
else {
printf("DCA already enabled!\n"); } }
void msr_dca_enable(void)
{
char msr_file_name[64];
int fd = 0, i = 0;
u64 data;
for (;i < NUM_CPUS; i++) {
sprintf(msr_file_name, "/dev/cpu/%d/msr", i);
fd = open(msr_file_name, O_RDWR);
if (fd < 0) {
perror("open failed!");
exit(1);
}
if (pread(fd, &data, sizeof(data), MSR_P6_DCA_CAP) !=
sizeof(data)) {
perror("reading msr failed!");
exit(1);
}
printf("got msr value: %*llx\n", 1, (unsigned long
long)data);
if (!(data & 1)) {
data |= 1;
if (pwrite(fd, &data, sizeof(data), MSR_P6_DCA_CAP) !=
sizeof(data)) {
perror("writing msr failed!");
exit(1);
}
} else {
printf("msr already enabled for CPU %d\n", i);
}
}
}
int main(void)
{
struct pci_access *pacc;
struct pci_dev *dev;
u8 type;
pacc = pci_alloc();
pci_init(pacc);
pci_scan_bus(pacc);
for (dev = pacc->devices; dev; dev=dev->next) {
pci_fill_info(dev, PCI_FILL_IDENT | PCI_FILL_BASES);
if (dev->vendor_id == PCI_VENDOR_ID_INTEL) {
type = pci_read_byte(dev, PCI_HEADER_TYPE);
if (type == PCI_HEADER_TYPE_BRIDGE) {
check_dca(dev);
}
}
}
msr_dca_enable();
return 0;
}
As you can see ixgbe, dca and ioatdma modules are loaded.
# lsmod
Module Size Used by
ixgbe 200547 0
pf_ring 327754 4
tcp_lp 2111 0
fuse 61934 3
sunrpc 201569 1
ip6t_REJECT 4263 2
nf_conntrack_ipv6 18078 4
ip6table_filter 1687 1
ip6_tables 17497 1 ip6table_filter
ipv6 286505 184 ip6t_REJECT,nf_conntrack_ipv6
uinput 7368 0
ioatdma 51376 72
i7core_edac 16210 0
dca 5590 2 ixgbe,ioatdma
bnx2 65569 0
mdio 3934 0
ses 6319 0
dcdbas 8540 0
edac_core 41336 1 i7core_edac
iTCO_wdt 11256 0
iTCO_vendor_support 2610 1 iTCO_wdt
power_meter 9545 0
hed 2206 0
serio_raw 4640 0
microcode 18662 0
enclosure 7518 1 ses
megaraid_sas 37653 2
# uname -a
Linux test 2.6.35.14-95.fc14.x86_64 #1 SMP Tue Aug 16 21:01:58 UTC
2011
x86_64 x86_64 x86_64 GNU/Linux
Thanks,
Andrew
Hey Andrew,
Sorry you're having issues with the 28599 and ixgbe. I haven't
done
much with the PF_RING networking driver but maybe we can see what is
going on with the ixgbe driver. It would help to know a little be more
information like:
- What there any interesting system log messages of note?
- How are your interrupt being divided among your queue's (cat
/proc/interrupts)? I know your testing with just one CPU are you also
just using one queue or affinizing one to that CPU?
- Could you provide the lspic -vvv output. To verify you NIC is
getting a PCIe x8 connection.
- What kind of cpu usage are you seeing if you don't use just the
base driver running at line rate with something like netperf/iperf?
- Have you attempted this without DCA? Like I said above I don't
have much experience with PF_RING so I may be missing some fundamental
advantage it is suppose to gain from operation with DCA in this mode.
These are just off the top of my head if I think of anything else
I'll let you know.
Thanks,
-Don Skidmore <donald.c.skidm...@intel.com<mailto:donald.c.skidm...@intel.com>>
---
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it. - Brian W. Kernighan
------------------------------------------------------------------------------
Doing More with Less: The Next Generation Virtual Desktop
What are the key obstacles that have prevented many mid-market businesses
from deploying virtual desktops? How do next-generation virtual desktops
provide companies an easier-to-deploy, easier-to-manage and more affordable
virtual desktop model.http://www.accelacomm.com/jaw/sfnl/114/51426474/
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel® Ethernet, visit
http://communities.intel.com/community/wired