Re: [OMPI devel] memcpy MCA framework

Terry Dontje Sat, 16 Aug 2008 11:51:07 -0400

George Bosilca wrote:

The intent of the memcpy framework is to allow a selection betweenseveral memcpy at runtime. Of course, there will be a preselection atcompile time, but all versions that can compile on a givenarchitecture will be benchmarked at runtime and the best one will beselected. There is a file with several versions of memcpy for x86 (32and 64) somewhere around (I should have one if interested), that canbe used as a starting point.

Ok, I guess I need to look at this code. I wonder if there may be casesfor Sun's machines in which this benchmark could end up picking thewrong memcpy?

The only thing we need is a volunteer to build the m4 magic. Figuringout what we can compile if kind of tricky, as some of the functionsare in assembly, some others in C, and some others a mixture (the MMXheaders).

Isn't the atomic code very similar? If I get to this point beforeanyone else I probably will volunteer.


--td

  george.

On Aug 16, 2008, at 3:19 PM, Terry Dontje wrote:
Hi Tim,
Thanks for bringing the below up and asking for a redirection to thedevel list. I think looking/using the MCA memcpy framework would bea good thing to do and maybe we can work on this together once I getout from under some commitments. However, some of the challengesthat originally scared me away from looking at the memcpy MCA iswhether we really want all the OMPI memcpy's to be replaced or justspecific ones. Also, I was concerned on trying to figure out whichversion of memcpy I should be using. I believe currently things aredone such that you get one version based on which system you compileon. For Sun there may be several different SPARC platforms thatwould need to use different memcpy code but we would like to justship one set of bits.Not saying the above not doable under the memcpy MCA framework justthat it somewhat scared me away from thinking about it at first glance.
--td
Date: Fri, 15 Aug 2008 12:08:18 -0400 From: "Tim Mattox"<[email protected]> Subject: Re: [OMPI users] SM btl slows downbandwidth? To: "Open MPI Users" <[email protected]> Message-ID:<[email protected]>Content-Type: text/plain; charset=ISO-8859-1 Hi Terry (and others),I have previously explored this some on Linux/X86-64 and concludedthat Open MPI needs to supply it's own memcpy routine to get good smperformance, since the memcpy supplied by glibc is not even close tooptimal. We have an unused MCA framework already set up to supply anopal_memcpy. AFAIK, George and Brian did the original work to set upthat framework. It has been on my to-do list for awhile to startimplementing opal_memcpy components for the architectures I haveaccess to, and to modify OMPI to actually use opal_memcpy where timakes sense. Terry, I presume what you suggest could be dealt withsimilarly when we are running/building on SPARC. Any followupdiscussion on this should probably happen on the developer mailinglist. On Thu, Aug 14, 2008 at 12:19 PM, Terry Dontje<[email protected]> wrote:
> Interestingly enough on the SPARC platform the Solaris memcpy'sactually use
> non-temporal stores for copies >= 64KB.  By default some of the mca
> parameters to the sm BTL stop at 32KB. I've doneexperimentations of> bumping the sm segment sizes to above 64K and seen incrediblespeedup on our> M9000 platforms. I am looking for some nice way to integrate amemcpy that
> lowers this boundary to 32KB or lower into Open MPI.
> I have not looked into whether Solaris x86/x64 memcpy's use thenon-temporal
> stores or not.
>
> --td
>>
>> Message: 1
>> Date: Thu, 14 Aug 2008 09:28:59 -0400
>> From: Jeff Squyres <[email protected]>
>> Subject: Re: [OMPI users] SM btl slows down bandwidth?
>> To: [email protected], Open MPI Users <[email protected]>
>> Message-ID: <[email protected]>
>> Content-Type: text/plain; charset=US-ASCII; format=flowed;delsp=yes
>>
>> At this time, we are not using non-temporal stores for sharedmemory
>>  operations.
>>
>>
>> On Aug 13, 2008, at 11:46 AM, Ron Brightwell wrote:
>>
>>
>>>>
>>>> >> [...]
>>>> >>
>>>> >> MPICH2 manages to get about 5GB/s in shared memoryperformance on the
>>>> >> Xeon 5420 system.
>>>>
>>>
>>> >
>>> > Does the sm btl use a memcpy with non-temporal stores likeMPICH2?>>> > This can be a big win for bandwidth benchmarks that don'tactually
>>> > touch their receive buffers at all...
>>> >
>>> > -Ron
>>> >
>>> >
>>> > _______________________________________________
>>> > users mailing list
>>> > [email protected]
>>> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>
>>
>> -- Jeff Squyres Cisco Systems
>
> _______________________________________________
> users mailing list
> [email protected]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
-- Tim Mattox, Ph.D. - http://homepage.mac.com/tmattox/[email protected] || [email protected] I'm a bright...http://www.the-brights.net/
_______________________________________________
devel mailing list
[email protected]
http://www.open-mpi.org/mailman/listinfo.cgi/devel
------------------------------------------------------------------------

_______________________________________________
devel mailing list
[email protected]
http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] memcpy MCA framework

Reply via email to