Re: [Lustre-discuss] Lustre read performance decay when OSSes are assigned in two different subnet

Hammitt, Charles Allen Thu, 15 Mar 2012 06:51:35 -0700

I’d take a look at how the switch ASICs line up with the connections in the 
system to see if one set is more burdened than another perhaps…
And;
If routing is not handled by the switch, I’d see if there are issues with the 
way routing is working for the different networks, perhaps there is a different 
path or similar ASIC or other network performance / congestion problem.  Simple 
network trace route might help flush out some questions; and a conversation 
with your networking team tracing cables and looking at interface stats.

Regards,
Charles

From: zhengfeng [mailto:[email protected]]
Sent: Thursday, March 15, 2012 9:38 AM
To: Hammitt, Charles Allen; [email protected]
Subject: Re: RE: [Lustre-discuss] Lustre read performance decay when OSSes are 
assigned in two different subnet

Thanks a lot , Charles.

I agree with you about this problem.

And I did more tests with the following steps:
0) Use 3 subnets to assign the 3 nodes.
1）Run "netperf" in the two OSS separately, run "netserver"  in "client";this 
step could simulate the
networking scenario: "client" reads data from two OSS, but here is no disk i/o 
or other r/w;
2) two OSS netperf's results are about 200 M/s, totally are 400M/s. so low - -!
3) run only netperf at one OSS, the test result is 950M/s.. this res is ok.
4) All the upper steps prove that, the networking is the bottleneck of the read 
performance.

When 2 NODEs send TCP stream at the same time, and only 1 NODE recv TCP stream. 
The total throughput is half of normal value. so oddball..
What induced that?  Thanks a lot

________________________________
Best regards
feng

From: Hammitt, Charles Allen<mailto:[email protected]>
Date: 2012-03-15 20:50
To: zf5984599<mailto:[email protected]>; 
[email protected]<mailto:[email protected]>
Subject: RE: [Lustre-discuss] Lustre read performance decay when OSSes are 
assigned in two different subnet

Networking overhead… vlan routing perhaps;  1) with either adding an extra 
network device hop and latency from a network device/router or 2) overburdened 
switch handling the routing itself still introducing network latency.    
Latency is the storage and network i/o bandwidth killer.

I’m willing to bet two things:
1) changing your stripe size from 2 to 1 will make similar bandwidth results to 
the diagram 2 [54.3MB/s], even if the layout is as diagram 1 [separate nets].
2) If all your OSS/MDS and Clients nodes were in the same single vlan 
network…you’d see better performance than diagram’s 2 54.3MB/sec bandwidth 
throughput.
So, drop classful subnets…go with cidr / supernetting networks to get the ip 
spaces you need and drop the extra routing latency.

Regards,
Charles

--
===========================================
Charles Hammitt
Storage Systems Specialist
ITS Research Computing @
The University of North Carolina-CH

===========================================

From: 
[email protected]<mailto:[email protected]>
 [mailto:[email protected]] On Behalf Of zhengfeng
Sent: Thursday, March 15, 2012 12:11 AM
To: [email protected]<mailto:[email protected]>
Subject: [Lustre-discuss] Lustre read performance decay when OSSes are assigned 
in two different subnet

Dear all,
We met one problem about Lustre read performance decay when OSSes are assigned 
in two different subnet.
Describing that in the following diagram:
diagram 1, OSS in different subnets:
Client (subnet 10.0.1.2)
  |
  |
  |
                Switch
|          |
|          |
|          |
   OSS1       OSS2
(10.0.2.2)   (10.0.3.2)
For diagram 1, we made the CLient OSS1 and OSS2 in 3 different subnets. the 
switch used is able forward all packages.
Use dd cmd to test r/w performance， write/rad data to/from to OSS1 and OSS2 at 
the same time:
test result:
[root@client client]# time dd if=test2 of=/dev/null bs=1M count=2000
2000+0 records in
2000+0 records out
2097152000 bytes (2.1 GB) copied, 53.5922 seconds, 39.1 MB/s

real 0m53.796s
user 0m0.005s
sys 0m2.914s

diagram 2, OSS in same subnet:
Client (subnet 10.0.1.2)
  |
  |
  |
                Switch
|          |
|          |
|          |
   OSS1       OSS2
(10.0.2.2, 10.0.2.3, at same subnet)

for diagram 2, we assigned OSS1 and OSS2 at the same subnet, then test:
test result:
[root@client219 client]# time dd of=/dev/null if=test1 bs=1M
10000+0 records in
10000+0 records out
10485760000 bytes (10 GB) copied, 193.07 seconds, 54.3 MB/s

conclusion:
In different subnets, the OSS read performance is 39.1 MB/s, while OSS in
same subnet, the read performance is 54.3 MB/s. the performance decays so much.

Question:
Why using different subnets in lustre, the performance decayed?
Anyone had met such problems? Many thanks for your answers and advice.

________________________________
B.R.
Feng

_______________________________________________
Lustre-discuss mailing list
[email protected]
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Re: [Lustre-discuss] Lustre read performance decay when OSSes are assigned in two different subnet

Reply via email to