Hi Tim,
Thanks for bringing the below up and asking for a redirection to the
devel list. I think looking/using the MCA memcpy framework would be a
good thing to do and maybe we can work on this together once I get out
from under some commitments. However, some of the challenges that
originally scared me away from looking at the memcpy MCA is whether we
really want all the OMPI memcpy's to be replaced or just specific ones.
Also, I was concerned on trying to figure out which version of memcpy I
should be using. I believe currently things are done such that you get
one version based on which system you compile on. For Sun there may be
several different SPARC platforms that would need to use different
memcpy code but we would like to just ship one set of bits.
Not saying the above not doable under the memcpy MCA framework just that
it somewhat scared me away from thinking about it at first glance.
--td
Date: Fri, 15 Aug 2008 12:08:18 -0400 From: "Tim Mattox"
<timat...@open-mpi.org> Subject: Re: [OMPI users] SM btl slows down
bandwidth? To: "Open MPI Users" <us...@open-mpi.org> Message-ID:
<ea86ce220808150908t62818a21k32c49b9b6f07...@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1 Hi Terry (and others), I
have previously explored this some on Linux/X86-64 and concluded that
Open MPI needs to supply it's own memcpy routine to get good sm
performance, since the memcpy supplied by glibc is not even close to
optimal. We have an unused MCA framework already set up to supply an
opal_memcpy. AFAIK, George and Brian did the original work to set up
that framework. It has been on my to-do list for awhile to start
implementing opal_memcpy components for the architectures I have
access to, and to modify OMPI to actually use opal_memcpy where ti
makes sense. Terry, I presume what you suggest could be dealt with
similarly when we are running/building on SPARC. Any followup
discussion on this should probably happen on the developer mailing
list. On Thu, Aug 14, 2008 at 12:19 PM, Terry Dontje
<terry.don...@sun.com> wrote:
> Interestingly enough on the SPARC platform the Solaris memcpy's actually use
> non-temporal stores for copies >= 64KB. By default some of the mca
> parameters to the sm BTL stop at 32KB. I've done experimentations of
> bumping the sm segment sizes to above 64K and seen incredible speedup on our
> M9000 platforms. I am looking for some nice way to integrate a memcpy that
> lowers this boundary to 32KB or lower into Open MPI.
> I have not looked into whether Solaris x86/x64 memcpy's use the non-temporal
> stores or not.
>
> --td
>>
>> Message: 1
>> Date: Thu, 14 Aug 2008 09:28:59 -0400
>> From: Jeff Squyres <jsquy...@cisco.com>
>> Subject: Re: [OMPI users] SM btl slows down bandwidth?
>> To: rbbr...@sandia.gov, Open MPI Users <us...@open-mpi.org>
>> Message-ID: <562557eb-857c-4ca8-97ad-f294c7fed...@cisco.com>
>> Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes
>>
>> At this time, we are not using non-temporal stores for shared memory
>> operations.
>>
>>
>> On Aug 13, 2008, at 11:46 AM, Ron Brightwell wrote:
>>
>>
>>>>
>>>> >> [...]
>>>> >>
>>>> >> MPICH2 manages to get about 5GB/s in shared memory performance on the
>>>> >> Xeon 5420 system.
>>>>
>>>
>>> >
>>> > Does the sm btl use a memcpy with non-temporal stores like MPICH2?
>>> > This can be a big win for bandwidth benchmarks that don't actually
>>> > touch their receive buffers at all...
>>> >
>>> > -Ron
>>> >
>>> >
>>> > _______________________________________________
>>> > users mailing list
>>> > us...@open-mpi.org
>>> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>
>>
>> -- Jeff Squyres Cisco Systems
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
-- Tim Mattox, Ph.D. - http://homepage.mac.com/tmattox/
tmat...@gmail.com || timat...@open-mpi.org I'm a bright...
http://www.the-brights.net/
- [OMPI devel] memcpy MCA framework Terry Dontje
-