Hello, Alex, 

>You might try posing the question to the netdev list 

Thanks for the hint. I'll give it a try.

> Also you may want to clarify things as the data is a bit confusing since
> it seems like you have two tests that are "iperf client -> server over 1
> teql Aggregate", with one yielding 5700Mb/s and the other ~3000Mb/s and
> I am not sure what the difference is supposed to be between those two

Sorry for the ambiguity.
Its the number of iperf processes running in parallel - using the same link - 
that makes the difference.

When I start more than one iperf client instance in parallel over the same 
teql link (i. e. same IP pair), I get close to 100 % bandwith utilisation. 
So there is no bottleneck in the underlying layers any more.
I tested it with 2 and with 10 iperf instances in parallel - the total sum of 
throughput is always > 95 %.

However, when I start only one iperf process, I get only half the throughput.
So it looks like there is a bottleneck at the per-process or per-TCP-stack 
sender side (which has Intel NICs + e1000e drivers). 

I don't have this bottleneck when I start iperf client on a venerable HP blade 
460c blade node (with much lesser general system performance, tg3 NIC 
driver) - neither from a blade node to Sabertooth gateway (which is the same 
phyical link), nor between two blades.

 just to clarify:  Manual page iperf(1) says
"To perform an iperf test the user must establish both a server (to discard 
traffic) and a client (to generate traffic)."
So, in iperf terms, test traffc by default is going from client -> server. 
Optionally, I can also do bidirectional testing. It does not matter where 
client/server-iperf programs live, but which way the traffic flows.


In contrast, below figures refer to "box-centered" designation: 
client = HP blade 460c G1 with broadcom NICs + tg3
server = Asus Sabertooth  with Intel NICs + e1000e

> > all right, current perfomance tests look like this:
> >
> > iperf client -> server over 6 pyhsical Links :
> >     6 x 990 MBit - OK
> > iperf client -> server over 1 teql Aggregate :
> >     5700 MBit > 90 % - OK
> >
> > iperf server - client over 6 phys links
> >     6 x 990...1000 MBIt -OK
> >
> > iperf client -> server over 1 teql Aggregate :
> >     ~ 3000 MBit ~~ 50 % - ##### NOT OK #####
> >
> > iperf client -> server over 10 parallel teql Aggregate :
> >     ~ 5800 MBit > 95 % - OK
> >
> > iperf client -> server over 2 parallel teql Aggregate :
> >     ~ 5900 MBit > 95 % - OK





> > > Specifically the 2.5GT/s with a width of
> > > x1 can barely push 1Gb/s.  This slot needs to be at least a x4 if you
> > > want to push anything more than 1Gb/s.

Just for curiousity, the answer of Asus support:

>> I think below picture are telling all the possible usage with PCIE slots:
>> 16/8/8/4 can be obtained via 3-way SLI.

>> [cid:image001.png@01D06338.C4B3F1F0]
>> [cid:image002.png@01D06339.348A7800]

The images refer to their manual pages how to place different Video cards into 
their board.

It really looks like they can't imagine there that people put anything else 
int a PCIe slot than graphic cards. :-\\\\

Odd, isn't it???


Wolfgang Rosner

------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel® Ethernet, visit 
http://communities.intel.com/community/wired

Reply via email to