Re: [OMPI devel] memcpy MCA framework

Jeff Squyres Sun, 17 Aug 2008 07:57:52 -0400

Let's talk about this in Dublin. I can probably help with the m4magic, but I need to understand exactly what needs to be done first.


On Aug 16, 2008, at 11:51 AM, Terry Dontje wrote:

George Bosilca wrote:
The intent of the memcpy framework is to allow a selection betweenseveral memcpy at runtime. Of course, there will be a preselectionat compile time, but all versions that can compile on a givenarchitecture will be benchmarked at runtime and the best one willbe selected. There is a file with several versions of memcpy forx86 (32 and 64) somewhere around (I should have one if interested),that can be used as a starting point.
Ok, I guess I need to look at this code. I wonder if there may becases for Sun's machines in which this benchmark could end uppicking the wrong memcpy?
The only thing we need is a volunteer to build the m4 magic.Figuring out what we can compile if kind of tricky, as some of thefunctions are in assembly, some others in C, and some others amixture (the MMX headers).
Isn't the atomic code very similar? If I get to this point beforeanyone else I probably will volunteer.
--td
 george.

On Aug 16, 2008, at 3:19 PM, Terry Dontje wrote:
Hi Tim,
Thanks for bringing the below up and asking for a redirection tothe devel list. I think looking/using the MCA memcpy frameworkwould be a good thing to do and maybe we can work on this togetheronce I get out from under some commitments. However, some of thechallenges that originally scared me away from looking at thememcpy MCA is whether we really want all the OMPI memcpy's to bereplaced or just specific ones. Also, I was concerned on tryingto figure out which version of memcpy I should be using. Ibelieve currently things are done such that you get one versionbased on which system you compile on. For Sun there may beseveral different SPARC platforms that would need to use differentmemcpy code but we would like to just ship one set of bits.Not saying the above not doable under the memcpy MCA frameworkjust that it somewhat scared me away from thinking about it atfirst glance.
--td
Date: Fri, 15 Aug 2008 12:08:18 -0400 From: "Tim Mattox" <[email protected]> Subject: Re: [OMPI users] SM btl slows down bandwidth? To:"Open MPI Users" <[email protected]> Message-ID: <[email protected]> Content-Type: text/plain; charset=ISO-8859-1 Hi Terry (andothers), I have previously explored this some on Linux/X86-64 andconcluded that Open MPI needs to supply it's own memcpy routineto get good sm performance, since the memcpy supplied by glibc isnot even close to optimal. We have an unused MCA frameworkalready set up to supply an opal_memcpy. AFAIK, George and Briandid the original work to set up that framework. It has been on myto-do list for awhile to start implementing opal_memcpycomponents for the architectures I have access to, and to modifyOMPI to actually use opal_memcpy where ti makes sense. Terry, Ipresume what you suggest could be dealt with similarly when weare running/building on SPARC. Any followup discussion on thisshould probably happen on the developer mailing list. On Thu, Aug14, 2008 at 12:19 PM, Terry Dontje <[email protected]> wrote:
> Interestingly enough on the SPARC platform the Solarismemcpy's actually use> non-temporal stores for copies >= 64KB. By default some ofthe mca> parameters to the sm BTL stop at 32KB. I've doneexperimentations of> bumping the sm segment sizes to above 64K and seen incrediblespeedup on our> M9000 platforms. I am looking for some nice way to integratea memcpy that
> lowers this boundary to 32KB or lower into Open MPI.
> I have not looked into whether Solaris x86/x64 memcpy's usethe non-temporal
> stores or not.
>
> --td
>>
>> Message: 1
>> Date: Thu, 14 Aug 2008 09:28:59 -0400
>> From: Jeff Squyres <[email protected]>
>> Subject: Re: [OMPI users] SM btl slows down bandwidth?
>> To: [email protected], Open MPI Users <[email protected]>
>> Message-ID: <[email protected]>
>> Content-Type: text/plain; charset=US-ASCII; format=flowed;delsp=yes
>>
>> At this time, we are not using non-temporal stores forshared memory
>>  operations.
>>
>>
>> On Aug 13, 2008, at 11:46 AM, Ron Brightwell wrote:
>>
>>
>>>>
>>>> >> [...]
>>>> >>
>>>> >> MPICH2 manages to get about 5GB/s in shared memoryperformance on the
>>>> >> Xeon 5420 system.
>>>>
>>>
>>> >
>>> > Does the sm btl use a memcpy with non-temporal storeslike MPICH2?>>> > This can be a big win for bandwidth benchmarks thatdon't actually
>>> > touch their receive buffers at all...
>>> >
>>> > -Ron
>>> >
>>> >
>>> > _______________________________________________
>>> > users mailing list
>>> > [email protected]
>>> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>
>>
>> -- Jeff Squyres Cisco Systems
>
> _______________________________________________
> users mailing list
> [email protected]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
-- Tim Mattox, Ph.D. - http://homepage.mac.com/tmattox/ [email protected]|| [email protected] I'm a bright... http://www.the-brights.net/
_______________________________________________
devel mailing list
[email protected]
http://www.open-mpi.org/mailman/listinfo.cgi/devel
------------------------------------------------------------------------

_______________________________________________
devel mailing list
[email protected]
http://www.open-mpi.org/mailman/listinfo.cgi/devel
_______________________________________________
devel mailing list
[email protected]
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems

Re: [OMPI devel] memcpy MCA framework

Reply via email to