Re: [OMPI devel] kernel 2.6.23 vs 2.6.24 - communication/wait times

Oliver Geisler Fri, 9 Apr 2010 17:00:18 -0400

Sorry for replying late. Unfortunately I am not "full time
administrator". And I am going to be a conference next week, so please
be patient with me replying.


On 4/7/2010 6:56 PM, Eugene Loh wrote:
> Oliver Geisler wrote:
> 
>> Using netpipe and comparing tcp and mpi communication I get the
>> following results:
>>
>> TCP is much faster than MPI, approx. by factor 12
>>  
>>
> Faster?  12x?  I don't understand the following:
> 
>> e.g a packet size of 4096 bytes deliveres in
>> 97.11 usec with NPtcp and
>> 15338.98 usec with NPmpi
>>  
>>
> This implies NPtcp is 160x faster than NPmpi.
> 

The ratio function NPtcp/NPmpi has a mean value of factor 60 for small
packet sizes <4kB, a maximum of 160 at 4kB (it was a bad value to pick
out in the first place), then dropping down to 40 for packet sizes of
about 16kB and further dropping below factor 20 for packets larger than
100kB.


>> or
>> packet size 262kb
>> 0.05268801 sec NPtcp
>> 0.00254560 sec NPmpi
>>  
>>
> This implies NPtcp is 20x slower than NPmpi.
> 

Sorry, my fault ... vice versa, should read:
packet size 262kb
0.00254560 sec NPtcp
0.05268801 sec NPmpi


>> Further our benchmark started with "--mca btl tcp,self" runs with short
>> communication times, even using kernel 2.6.33.1
>>
>> Is there a way to see what type of communication is actually selected?
>>
>> Can anybody imagine why shared memory leads to these problems?
>>  
>>
> Okay, so it's a shared-memory performance problem since:
> 
> 1) You get better performance when you exclude sm explicitly with "--mca
> btl tcp,self".
> 2) You get better performance when you exclude sm by distributing one
> process per node (an observation you made relatively early in this thread).
> 3) TCP is faster than MPI (which is presumably using sm).
> 
> Can you run a pingpong test as a function of message length for two
> processes in a way that demonstrates the problem?  For example, if
> you're comfortable with SKaMPI, just look at Pingpong_Send_Recv and
> let's see what performance looks like as a function of message length. 
> Maybe this is a short-message-latency problem.

This is the results of skampi pt2pt, first with shared memory allowed,
second shared memory excluded.
It doesn't look to me as the long message times are related to short
messages.
Including hosts over ethernet results in higher communication times
which are equal to those when I ping the host (a hundred+ milliseconds).

mpirun --mca btl self,sm,tcp -np 2 ./skampi -i ski/skampi_pt2pt.ski

# begin result "Pingpong_Send_Recv"
count= 1        4   12756.0     307.4       16   11555.3   11011.2
count= 2        8    9902.8     629.0       16    9615.4    8601.0
count= 3       12   12547.5     881.0       16   12233.1   11229.2
count= 4       16   12087.2     829.6       16   11610.6   10478.6
count= 6       24   13634.4     352.1       16   11247.8   12621.9
count= 8       32   13835.8     282.2       16   11091.7   12944.6
count= 11       44   13328.9     864.6       16   12095.6   11977.0
count= 16       64   13195.2     432.3       16   11460.4   10051.9
count= 23       92   13849.3     532.5       16   12476.9   12998.1
count= 32      128   14202.2     436.4       16   11923.8   12977.4
count= 45      180   14026.3     637.7       16   13042.5   12767.8
count= 64      256   13475.8     466.7       16   11720.4   12521.3
count= 91      364   14015.0     406.1       16   13300.4   12881.6
count= 128      512   13481.3     870.6       16   11187.7   12070.6
count= 181      724   10697.1      98.4       16   10697.1    9520.1
count= 256     1024   14120.8     602.1       16   13988.2   11349.9
count= 362     1448   15718.2     582.3       16   14468.2   12535.2
count= 512     2048   11214.9     749.1       16   11155.0    9928.5
count= 724     2896   15127.3     186.1       16   15127.3   10974.9
count= 1024     4096   34045.0     692.2       16   32963.6   31728.1
count= 1448     5792   29965.9     788.1       16   27997.8   27404.4
count= 2048     8192   30082.1     785.3       16   28023.9   29538.5
count= 2896    11584   32556.0     219.4       16   29312.2   32290.4
count= 4096    16384   24999.8     839.6       16   23422.0   23644.6
# end result "Pingpong_Send_Recv"
# duration = 10.15 sec

mpirun --mca btl tcp,self -np 2 ./skampi -i ski/skampi_pt2pt.ski

# begin result "Pingpong_Send_Recv"
count= 1        4      14.5       0.3       16      13.5      13.2
count= 2        8      13.5       0.2        8      12.9      12.4
count= 3       12      13.1       0.4       16      12.7      11.3
count= 4       16      13.9       0.4       16      12.7      13.0
count= 6       24      13.8       0.4       16      12.5      12.8
count= 8       32      13.8       0.4       16      12.7      13.0
count= 11       44      14.0       0.3       16      12.8      13.0
count= 16       64      13.5       0.5       16      12.3      12.4
count= 23       92      13.9       0.4       16      13.1      12.7
count= 32      128      14.8       0.1       16      13.1      14.5
count= 45      180      14.2       0.4        8      13.1      12.9
count= 64      256      15.1       0.2       16      13.3      14.8
count= 91      364      16.5       0.3       16      14.1      16.1
count= 128      512      12.8       0.2        8      11.5      12.5
count= 181      724      13.4       0.3       16      11.5      13.3
count= 256     1024      14.0       0.3       16      11.7      14.0
count= 362     1448      13.2       0.3       16      12.2      12.5
count= 512     2048      15.4       0.2       16      12.5      15.4
count= 724     2896      15.7       0.2       16      13.1      15.7
count= 1024     4096      17.0       0.1        8      13.5      17.0
count= 1448     5792      18.5       0.2       16      15.5      18.5
count= 2048     8192      20.4       0.2       16      17.1      20.4
count= 2896    11584      24.1       0.1       16      21.0      24.0
count= 4096    16384      32.0       0.0       16      27.2      32.0
# end result "Pingpong_Send_Recv"
# duration = 0.01 sec

 mpirun --mca btl tcp,self -np 2 -host cluster-13,cluster-16 ./skampi -i
ski/skampi_pt2pt.ski

# begin result "Pingpong_Send_Recv"
count= 1        4     133.1       0.4       16     133.1      84.8
count= 2        8     132.7       0.1       16     132.7      85.0
count= 3       12     133.2       0.3        8     133.2      85.2
count= 4       16     133.8       0.2        8     133.8      85.5
count= 6       24     134.0       0.0        8     134.0      85.5
count= 8       32     134.2       0.2       16     134.2      86.8
count= 11       44     134.0       0.0        8     134.0      86.2
count= 16       64     135.2       0.2        8     135.2      87.0
count= 23       92     136.3       0.1       16     136.3      88.5
count= 32      128     137.6       0.2       16     137.6      90.3
count= 45      180     139.0       0.0        8     139.0      91.2
count= 64      256     138.8       2.2        8     130.0     104.6
count= 91      364     143.9       0.1       16     143.9      96.2
count= 128      512     148.5       0.3        8     148.5     101.8
count= 181      724     157.3       0.2       16     157.3     111.0
count= 256     1024     169.8       0.2        8     169.8     123.8
count= 362     1448     163.4       0.3        8     161.0     163.4
count= 512     2048     207.2       0.2        8     207.2     163.5
count= 724     2896     235.5       1.7        8     235.5     190.0
count= 1024     4096     233.0       0.6        8     230.7     233.0
count= 1448     5792     314.2       3.3       16     314.2     264.9
count= 2048     8192     343.0       3.9        8     343.0     295.0
count= 2896    11584     540.0      11.2       16     539.9     456.8
count= 4096    16384     636.3      13.2       16     636.3     473.1
# end result "Pingpong_Send_Recv"
# duration = 0.07 sec

> _______________________________________________
> devel mailing list
> [email protected]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 


-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

Re: [OMPI devel] kernel 2.6.23 vs 2.6.24 - communication/wait times

Reply via email to