Hello, Alex, 

> >You might try posing the question to the netdev list 
> Thanks for the hint. I'll give it a try.

I got response from a man named Eric Dumazet , 
starts promising...


Am Samstag, 21. März 2015 05:43:56 schrieben Sie:
> On 03/20/2015 09:14 AM, Wolfgang Rosner wrote:

> I get all that but I suspect there is likely something that is still
> missing.  

The difference is the -P2 option in the second case, which opens two client 
threads in parallel. 
As you cans see in the output lines, then there are are two connections, each 
2.9 Gbit/sec, adding up to 5.8 GBit/sec

OK, let me repeat the whole console copy:

======== console log =============

root@cruncher:/run/shm# iperf -c  192.168.130.227
------------------------------------------------------------
Client connecting to 192.168.130.227, TCP port 5001
TCP window size:  488 KByte (default)
------------------------------------------------------------
[  3] local 192.168.130.250 port 55078 connected with 192.168.130.227 port 
5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  3.93 GBytes  3.38 Gbits/sec


root@cruncher:/run/shm# iperf -c  192.168.130.227 -P2
------------------------------------------------------------
Client connecting to 192.168.130.227, TCP port 5001
TCP window size:  488 KByte (default)
------------------------------------------------------------
[  4] local 192.168.130.250 port 55083 connected with 192.168.130.227 port 
5001
[  3] local 192.168.130.250 port 55082 connected with 192.168.130.227 port 
5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  3.41 GBytes  2.93 Gbits/sec
[  4]  0.0-10.0 sec  3.37 GBytes  2.90 Gbits/sec
[SUM]  0.0-10.0 sec  6.78 GBytes  5.82 Gbits/sec

======== console off =============


 Manual page iperf(1)
....
       -P, --parallel n
              number of parallel client threads to run

===============================



btw: I just managed to catch top windows:

for the first case (one thread)
  PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM    TIME+  COMMAND
24561 root      20   0 96700 1492 1340 S  22.2  0.0   0:00.80 iperf

for the second case (two threads)
  PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM    TIME+  COMMAND
25086 root      20   0  166m 1516 1368 S  41.2  0.0   0:01.31 iperf

So it's not a CPU bottleneck.




> I wasn't able to see the images, I'm not sure if you actually had them
> attached.  

I hadn't attached them. Didn't assign much value out of context.
I just wanted to close open questions - regarding my needs.

Do you think the linux community will gain from digging deeper?
Sqeezing the last drop of blood out of this board?
OK, so lets go on, and at least share some findings I haven't reported yet.

I think it is better to grasp the full context here in the manual:
http://dlcdnet.asus.com/pub/ASUS/mb/SocketAM3+/SABERTOOTH_990FX_R2.0/E8042_SABERTOOTH_990FX_R2.pdf
E8042_SABERTOOTH_990FX_R2-1.pdf

said images refer to
Page 36 (pdf numbering) bottom
Page 37 top

-----

I can't track down this IRQ sharing issue on table 37 bottom.
IRQ sharing was on my list of supects.

But when I look into /proc/interrupts of my running box, I do not only find 
different IRQ for every PCIe card, but even one for every of the 4 GBit Ports 
on each Card.

But maybe I miss something?
Things got more complicated since I soldered my fist 6502 CPU 35 years ago...
I have some dark remembering of hard and soft interrupts...
If it were that it takes some µsec to look up some soft IRQ pointer that share 
the same hard IRQ, this may hurt performance??

> However I suspect one of two things in the board layout. 
> Either the slot is what we call a PCIe graphics (PEG) slot, 
> in which 
> case it only supports x16 or x1 and nothing in between, 

hm. this may somehow be.
We are talking about slot PCIe2.0x16_4 .
accorting to mentioned tables in the manual, it either delivers x8 or x1.
In some ads, they refer to x16-x8-x6-x4 assignment.

But according to "Leo from Taiwan", 
"16/8/8/4 can be obtained via 3-way SLI."

.... which I think is a nvidia proprietary configuration and "not common" for 
NICs

> or you connected
> to a slot hanging off of the southbridge for the board.
>

Not exactly, but simliar as my half-educated guess goes.
Looks like at Asus they have invented the "east bridge"....

The thing is called
         [IDT] PES12N3A PCI Express Switch 

This were the bottleneck, which displayed x1 as I remember, 
while the NIC itself displayed x4

But I think it is not identical to the "South bridge" - see below.

As I said, now I have the nvidia Quadro 2000 plugged there.

snippet from lspci -tv
+-0c.0-[0a]--+-00.0  NVIDIA Corporation GF106GL [Quadro 2000]
|            \-00.1  NVIDIA Corporation GF106 High Definition Audio Controller

Does this read that all traffic to devic 0a has to pass device 0c?
I didn't find explanation fo this PCIe bus structure stuff I could grasp.


0a:00.0 VGA compatible controller: NVIDIA Corporation GF106GL [Quadro 2000] 
(rev a1)
0a:00.1 Audio device: NVIDIA Corporation GF106 High Definition Audio 
Controller (rev a1)
0b:00.0 PCI bridge: Integrated Device Technology, Inc. [IDT] PES12N3A PCI 
Express Switch (rev 0e)
0c:02.0 PCI bridge: Integrated Device Technology, Inc. [IDT] PES12N3A PCI 
Express Switch (rev 0e)
0c:04.0 PCI bridge: Integrated Device Technology, Inc. [IDT] PES12N3A PCI 
Express Switch (rev 0e)

here I read
http://en.wikipedia.org/wiki/AMD_900_chipset_series
        "Southbridge: SB950"

so I consider this as south bridge. 
It ranges from device 00:11 to 00:16

lspci -v | grep SB9
00:11.0 SATA controller: Advanced Micro Devices, Inc. [AMD/ATI] 
SB7x0/SB8x0/SB9x0 SATA Controller
.....
00:16.2 USB controller: Advanced Micro Devices, Inc. [AMD/ATI] 
SB7x0/SB8x0/SB9x0 


I see, snippeting is going confusing here again.
I'll attach the full output of some detailled system information:

        lspci -tvvv > lspci_tv
        lspci -vv | grep -P "[0-9a-f]{2}:[0-9a-f]{2}\.[0-9a-f]|LnkSta:" >       
        
                lspci_LnkSta
        lspci > lspci_short
        lspci -vvvv > lspci_vvvv
        hwinfo > hwinfo.out.2015-03-19

For reference and matching the manual, the list of devices as I plugged them, 
when those file were generated:

 pciex16_1
        01:00.0, 01:00.1
        NVIDIA Corporation GF100GL [Tesla M2070]

 pciex1_1
        empty

 pciex16_2
        0d:00.0, 0d:00.1, 0e:00.0, 0e:00.1
        HP NC364T PCI Express Quad Port Gigabit Server Adapter

 pciex16_3
        08:00.0, 08:00.1, 09:00.0, 09:00.1
        Intel Corporation PRO/1000 PT Quad Port Server Adapter

PCI
        0f:05.0  RTL-8100/8101L/8139 PCI Fast Ethernet Adapter

 pciex16_4
        0a:00.0, 0a:00.1
        NVIDIA Corporation GF106GL [Quadro 2000]

The other devices are embedded in Main board.

I'll also put the list into CC again, so google and other curious people might 
find the info to work further, if they like.


> You might have a better idea from just searching for 990FX info since
> that is the chipset for your board it doesn't seem like there is any
> good documentation for the board layout itself.

You mean like this:
AMD 990FX/990X/970  Databook
http://support.amd.com/TechDocs/48691.pdf#search=990fx
        Pg 17 has Block scheme
        Pg 22 has a Table of PCIe Ports

but I cannot match either of them, neither to the sabertooth manual labelling, 
nor to the lspci numbering scheme.

Googling [IDT] PES12N3A PCI Express Switch
I find a Datasheed
http://datasheet.octopart.com/89HPES12N3AZCBC-Integrated-Device-Technology-datasheet-11790063.pdf

12 PCI Express Lanes
One x4 Upstream Port and Two x4 Downstream Ports
Pg 3 Fig 3.... Hm looks like some kind of "mid bridge" to me....

Hm... but just piling the parts does not yet yield a whole, I'm afraid.
I could imagine that this switch is supposed to provide some interconnection 
between multiple graphic cards, as SLI goes, without blocking the main board.

But reverse engineering this is definitve way beyond my capabilites.
So when I run out of PCIe lanes again, on one of the so said "high end" 
consumer grade main boards, I'l have to go for some good old used server 
hardware. Or split work to multiple boxes.

But of course I can still provide some information and testing for this 
process, as long as I don't have to screw my whole system.

so long.
>
> - Alex


Wolfgang 

------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel® Ethernet, visit 
http://communities.intel.com/community/wired

Reply via email to