I can’t speak to the packing question, but I can say that we have indeed confirmed the lack of maintenance on OMPI for Debian/Ubuntu and are working to resolve the problem.
> On Feb 11, 2016, at 1:16 AM, Gilles Gouaillardet > <gilles.gouaillar...@gmail.com> wrote: > > Michael, > > MPI_Pack_external must convert data to big endian, so it can be dumped into a > file, and be read correctly on big and little endianness arch, and with any > MPI flavor. > > if you use only one MPI library on one arch, or if data is never read/written > from/to a file, then it is more efficient to MPI_Pack. > > openmpi is optimized and the data is swapped only when needed. > so if your cluster is little endian only, MPI_Send and MPI_Recv will never > byte swap data internally. > if both ends have different endianness, data is sent in big endian format and > byte swapped when received only if needed. > generally speaking, a send/recv requires zero or one byte swap. > > fwiw, we previously had a claim that debian nor Ubuntu have a maintainer for > openmpi, which would explain why an obsolete version is shipped. I made a few > researchs and could not find any evidence openmpi is no more maintained. > > Cheers, > > Gilles > > > > On Thursday, February 11, 2016, Michael Rezny <michael.re...@monash.edu > <mailto:michael.re...@monash.edu>> wrote: > Hi Gilles, > thanks for thinking about this in more detail. > > I understand what you are saying, but your comments raise some questions in > my mind: > > If one is in a homogeneous cluster, is it important that, in the case of > little-endian, that the data be > converted to extern32 format (big-endian), only to be always converted at the > receiving rank > back to little-endian? > > This would seem to be inefficient, especially if the site has no need for > external MPI access. > > So, does --enable-heterogeneous do more than put MPI routines using > "extern32" into straight pass-through? > > Back in the old days of PVM, all messages were converted into network order. > This had severe performance impacts > on little-endian clusters. > > So much so that a clever way of getting around this was an implementation of > "receiver makes right" in which > all data was sent in the native format of the sending rank. The receiving > rank analysed the message to determine if > a conversion was necessary. In those days with Cray format data, it could be > more complicated than just byte swapping. > > So in essence, how is a balance struck between supporting heterogenous > architectures and maximum performance > with codes where message passing performance is critical? > > As a follow up, since I am now at home, this same problem also exists with > the Ubuntu 15.10 OpenMP packages > which surprisingly are still at 1.6.5, same as 14.04. > > Again, downloading, building, and using the latest stable version of OpenMP > solved the problem. > > kindest regards > Mike > > > On 11/02/2016, at 7:31 PM, Gilles Gouaillardet wrote: > >> Michael, >> >> I think it is worst than that ... >> >> without --enable-heterogeneous, it seems the data is not correctly packed >> (e.g. it is not converted to big endian), at least on a x86_64 arch. >> unpack looks broken too, but pack followed by unpack does work. >> that means if you are reading data correctly written in external32e format, >> it will not be correctly unpacked. >> >> with --enable-heterogeneous, it is only half broken >> (I do not know yet whether pack or unpack is broken ...) >> and pack followed by unpack does not work. >> >> I will double check that tomorrow >> >> Cheers, >> >> Gilles >> >> On Thursday, February 11, 2016, Michael Rezny <michael.re...@monash.edu <>> >> wrote: >> Hi Ralph, >> you are indeed correct. However, many of our users >> have workstations such as me, with OpenMPI provided by installing a package. >> So we don't know what has been configured. >> >> Then we have failures, since, for instance, Ubuntu 14.04 by default appears >> to have been built >> with heterogeneous support! The other (working) machine is a large HPC, and >> it seems OpenMPI was built >> without heterogeneous support. >> >> Currently we work around the problem for packing and unpacking by having a >> compiler switch >> that will switch between calls to pack/unpack_external and pac/unpack. >> >> It is only now we started to track down what the problem actually is. >> >> kindest regards >> Mike >> >> On 11 February 2016 at 15:54, Ralph Castain <r...@open-mpi.org <>> wrote: >> Out of curiosity: if both systems are Intel, they why are you enabling >> hetero? You don’t need it in that scenario. >> >> Admittedly, we do need to fix the bug - just trying to understand why you >> are configuring that way. >> >> >>> On Feb 10, 2016, at 8:46 PM, Michael Rezny <michael.re...@monash.edu <>> >>> wrote: >>> >>> Hi Gilles, >>> I can confirm that with a fresh download and build from source for OpenMPI >>> 1.10.2 >>> with --enable-heterogeneous >>> the unpacked ints are the wrong endian. >>> >>> However, without --enable-heterogeneous, the unpacked ints are correct. >>> >>> So, this problem still exists in heterogeneous builds with OpenMPI version >>> 1.10.2. >>> >>> kindest regards >>> Mike >>> >>> On 11 February 2016 at 14:48, Gilles Gouaillardet >>> <gilles.gouaillar...@gmail.com <>> wrote: >>> Michael, >>> >>> does your two systems have the same endianness ? >>> >>> do you know how openmpi was configure'd on both systems ? >>> (is --enable-heterogeneous enabled or disabled on both systems ?) >>> >>> fwiw, openmpi 1.6.5 is old now and no more maintained. >>> I strongly encourage you to use openmpi 1.10.2 >>> >>> Cheers, >>> >>> Gilles >>> >>> On Thursday, February 11, 2016, Michael Rezny <michael.re...@monash.edu <>> >>> wrote: >>> Hi, >>> I am running Ubuntu 14.04 LTS with OpenMPI 1.6.5 and gcc 4.8.4 >>> >>> On a single rank program which just packs and unpacks two ints using >>> MPI_Pack_external and MPI_Unpack_external >>> the unpacked ints are in the wrong endian order. >>> >>> However, on a HPC, (not Ubuntu), using OpenMPI 1.6.5 and gcc 4.8.4 the >>> unpacked ints are correct. >>> >>> Is it possible to get some assistance to track down what is going on? >>> >>> Here is the output from the program: >>> >>> ~/tests/mpi/Pack test1 >>> send data 000004d2 0000162e >>> MPI_Pack_external: 0 >>> buffer size: 8 >>> MPI_unpack_external: 0 >>> recv data d2040000 2e160000 >>> >>> And here is the source code: >>> >>> #include <stdio.h> >>> #include <mpi.h> >>> >>> int main(int argc, char *argv[]) { >>> int numRanks, myRank, error; >>> >>> int send_data[2] = {1234, 5678}; >>> int recv_data[2]; >>> >>> MPI_Aint buffer_size = 1000; >>> char buffer[buffer_size]; >>> >>> MPI_Init(&argc, &argv); >>> MPI_Comm_size(MPI_COMM_WORLD, &numRanks); >>> MPI_Comm_rank(MPI_COMM_WORLD, &myRank); >>> >>> printf("send data %08x %08x \n", send_data[0], send_data[1]); >>> >>> MPI_Aint position = 0; >>> error = MPI_Pack_external("external32", (void*) send_data, 2, MPI_INT, >>> buffer, buffer_size, &position); >>> printf("MPI_Pack_external: %d\n", error); >>> >>> printf("buffer size: %d\n", (int) position); >>> >>> position = 0; >>> error = MPI_Unpack_external("external32", buffer, buffer_size, &position, >>> recv_data, 2, MPI_INT); >>> printf("MPI_unpack_external: %d\n", error); >>> >>> printf("recv data %08x %08x \n", recv_data[0], recv_data[1]); >>> >>> MPI_Finalize(); >>> >>> return 0; >>> } >>> >>> >>> >>> _______________________________________________ >>> devel mailing list >>> de...@open-mpi.org <> >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> <http://www.open-mpi.org/mailman/listinfo.cgi/devel> >>> Link to this post: >>> http://www.open-mpi.org/community/lists/devel/2016/02/18573.php >>> <http://www.open-mpi.org/community/lists/devel/2016/02/18573.php> >>> >>> _______________________________________________ >>> devel mailing list >>> de...@open-mpi.org <> >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> <http://www.open-mpi.org/mailman/listinfo.cgi/devel> >>> Link to this post: >>> http://www.open-mpi.org/community/lists/devel/2016/02/18575.php >>> <http://www.open-mpi.org/community/lists/devel/2016/02/18575.php> >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org <> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> <http://www.open-mpi.org/mailman/listinfo.cgi/devel> >> Link to this post: >> http://www.open-mpi.org/community/lists/devel/2016/02/18576.php >> <http://www.open-mpi.org/community/lists/devel/2016/02/18576.php> >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org <> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> <http://www.open-mpi.org/mailman/listinfo.cgi/devel> >> Link to this post: >> http://www.open-mpi.org/community/lists/devel/2016/02/18579.php >> <http://www.open-mpi.org/community/lists/devel/2016/02/18579.php> > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2016/02/18582.php > <http://www.open-mpi.org/community/lists/devel/2016/02/18582.php>