Re: [OMPI devel] [EXTERNAL] Re: Latency perf: v1.6 vs. v1.7 vs. trunk

Barrett, Brian W Thu, 25 Oct 2012 13:00:49 -0400

Your first e-mail got eaten by our virus scanner (it doesn't like .bz2
files), but we could probably only register the libnbc progress function
on first use, but it would slightly slow down all non blocking
collectives.  Probably worth it, but not sure I'll have time to add that
code today.


Brian

On 10/25/12 10:55 AM, "Jeff Squyres" <jsquy...@cisco.com> wrote:

>Something that might not be clear from my initial writeup:
>
>1. I had to go change C code to disable libnbc.  Since non-blocking
>collectives are part of MPI-3:
>   a) we have no convenient configure argument to not build the libnbc
>coll component (there is a way, but it's laborious), and
>   b) even if we did, OMPI's coll selection will fail at run time because
>it didn't find modules for the non-blocking collective operations.
>
>2. Hence:
>   a) performance is bad, at least partially because of libnbc
>   b) there's also some other bad performance oddities in there
>   c) but there's some good performance improvements, too, that would be
>good to bring to v1.7 (and v1.6, if possible)
>
>
>On Oct 25, 2012, at 12:32 PM, Jeff Squyres wrote:
>
>> Attached are the following graphs:
>> 
>> 1. sm NetPipe latencies up to size 150 bytes (run on a Sandy Bride, 2
>>procs same core)
>> 2. openib NetPipe latencies up to size 150 bytes (run on 2 old Xeons
>>[pre-Nehalem] with old Mellanox ConnectX IB HCAs)
>> 3. Same as #1, but all the way up to 8MB
>> 4. Same as #2, but all the way up to 8MB
>> 
>> I also attached a tarball of all my raw net pipe numbers (since the
>>graphs are loglog).
>> 
>> There's definite weirdness here.  Here's some observations:
>> 
>> a) Trunk openib latency is noticeably better in the mid-range as
>>compared to v1.6 and v1.7.  This is good!  Is this change something that
>>can be brought to v1.6 / v1.7?
>> 
>> b) The addition of the libnbc progress function to the progress loop
>>has a non-zero impact on latency.  It's most noticeable in graphs #1 and
>>#2.  Can something be done to only add the libnbc progress function to
>>the loop only when NBC operations are ongoing?  Right now, the libnbc
>>progress function is *always* added to the progress loop, even if you
>>never use any NBCs.
>> 
>> c) There's a noticeable increase in small message latency for the
>>openib BTL in v1.7 as compared to the trunk and v1.6 branches.  I don't
>>know if this is an openib thing, or the result of something else.
>> 
>> d) The trunk (without libnbc) has the best small message sm latency,
>>period -- even better than v1.6.  Yay!  Is this decrease in latency
>>(compared to v1.6) something that can be brought to v1.7?
>> 
>> -- 
>> Jeff Squyres
>> jsquy...@cisco.com
>> For corporate legal information go to:
>>http://www.cisco.com/web/about/doing_business/legal/cri/
>> 
>><netpipe-sm-latencies-to-128.pdf><netpipe-openib-latencies-to-128.pdf><ne
>>tpipe-sm-latencies.pdf><netpipe-openib-latencies.pdf><netpipe-latency-num
>>bers.tar.bz2>
>
>
>-- 
>Jeff Squyres
>jsquy...@cisco.com
>For corporate legal information go to:
>http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
>_______________________________________________
>devel mailing list
>de...@open-mpi.org
>http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>

Re: [OMPI devel] [EXTERNAL] Re: Latency perf: v1.6 vs. v1.7 vs. trunk

Reply via email to