Hi Gilles, thanks for the detailed explanation. Have a nice weekend Mike
On 12/02/2016, at 11:23 PM, Gilles Gouaillardet wrote: > Michael, > > Per the specifications, MPI_Pack_external and MPI_Unpack_external must > pack/unpack to/from big endian, regardless the endianness of the host. > On a little endian system, byte swapping must occur because this is what you > are explicitly requesting. > These functions are really meant to be used in order to write a buffer to a > file, so it can be read on an other arch, and potentially with an other MPI > library (see the man page) > > Today, this is not the case and these are two bugs. > 1. with --enable-heterogeneous, MPI_Pack_external does not do any byte > swapping on little endian arch, so your test fails. > 2. without --enable-heterogeneous, nor MPI_Pack_external nor > MPI_Unpack_external does any byte swapping. Even if your test is working > fine, keep in mind the buffer is not in big endian format, and should not be > dumped into a file if you plan to read it later with a bug free > MPI_Unpack_external. > > Once the bugs are fixed, > If you want to run on a heterogeneous cluster, you have to > - configure with --enable-heterogeneous > - use MPI_Pack_external and MPI_unpack_external if you want to pack a > message, send it to an other host with type MPI_PACKED, and unpack it there. > - not use MPI_Pack/MPI_Unpack to send/recv messages between hosts with > different endianness. > > If you are only transferring predefined and derived datatypes, you have > nothing to do, > Openmpi will automatically swap bytes on the receiver side if needed. > > If you want to run on a homogeneous system, you do not need > --enable-heterogeneous, and you can use MPI_Pack/MPI_Unpack, that is more > efficient than MPI_Pack_external/MPI_Unpack_external to send/recv messages. > > > > For the time being, you are not able to write portable data with > MPI_Pack_external. > The easiest way is to run on a homogeneous cluster, configure openmpi without > --enable-heterogeneous and without --enable-debug, so pack/unpack will work > regardless you use the external or the non external subroutines. > Generally speaking, I recommend you use derived datatypes instead of manually > packing/unpacking data to/from buffers. > > Cheers, > > Gilles > > On Friday, February 12, 2016, Michael Rezny <michael.re...@monash.edu> wrote: > Hi Gilles, > I am misunderstanding something here. What you are now saying seems, to me, > to be at odds with what you said previously. > > Assume the situation where both sender and receiver are little-endian, and > discussing only MPI_Pack_external, and MPI_Unpack_external > > Consider case 1 --enable-heterogeneous: > In your previous email I understood that "receiver make right" was being > implemented > So, sender does not byte-swap, and message is sent in (native) little-endian > format. > Receiver recognises the received message is in little-endian format and since > this is also its native format, no byte swap is needed. > > Consider case 2 --disable-heterogeneous > It seems strange, that, in this case, any byte swapping would ever need to > occur. > One is assuming a homogeneous system and sender and receiver will always be > using their native format. > i.e, exactly the same as MPI_Pack and MPI_Unpack. > > kindest regards > Mike > > On 12/02/2016, at 9:25 PM, Gilles Gouaillardet wrote: > >> Michael, >> >> byte swapping only occurs if you invoke MPI_Pack_external and >> MPI_Unpack_external on little endianness systems. >> >> MPI_Pack and MPI_Unpack uses the same engine that MPI_Send and MPI_Recv and >> this does not involve any byte swapping if both ends have the same >> endianness. >> >> Cheers, >> >> Gilles >> >> On Friday, February 12, 2016, Michael Rezny <michael.re...@monash.edu> wrote: >> Hi, >> oh, that is good news! The process is meant to be implementing "receiver >> makes right" which is good news for efficiency. >> >> But, in the second case, without --enable-heterogeneous, are you saying that >> on little-endian machines, byte swapping >> is meant to always occur? That seems most odd. I would have thought that if >> one only wants to work and then to configure >> OpenMPI for this mode, then there is no need to check at the receiving end >> whether byte-swapping is needed or not. It will be assumed >> that both sender and receiver are agreed on the format, whatever it is. On a >> homogeneous little-endian HPC cluster one would not want >> the extra overhead of two conversions for every packed message. >> >> Is it possible that the assert has been implemented incorrectly in this case? >> >> There is absolutely no urgency with regard to a fix. Thanks to your quick >> response, we now understand what is causing >> the problem and are in the process of implementing a test in ./configure to >> determine if the bug is present, and if so, >> add a compiler flag to switch to using MPI_Pack and MPI_Unpack. >> >> It would be good if you would be kind enough to let me know when a fix is >> available and I will download, build, >> and test it on our application. Then this version can be installed as the >> default. >> >> Once again, many thanks for your prompt and most helpful responses. >> >> warmest regards >> MIke >> >> On 12/02/2016, at 7:03 PM, Gilles Gouaillardet wrote: >> >>> Michael, >>> >>> i'd like to correct what i wrote earlier >>> >>> in heterogeneous clusters, data is sent "as is" (e.g. no byte swapping) and >>> it is byte swapped when received and only if needed. >>> >>> with --enable-heterogeneous, MPI_Unpack_external is working, but >>> MPI_Pack_external is broken >>> (e.g. no byte swapping occurs on little endian arch) since we internall use >>> the similar mechanism used to send data. that is a bug and i will work on >>> that. >>> >>> without --enable-heterogeneous, MPI_Pack_external nor MPI_Unpack_external >>> do any byte swapping and they >>> are both broken. fwiw, it you configure'd with --enable-debug, you would >>> have ran into an assert error (e.g. crash). >>> >>> i will work on a fix, but it might take some time before it is ready >>> >>> Cheers, >>> >>> Gilles >>> On 2/11/2016 6:16 PM, Gilles Gouaillardet wrote: >>>> Michael, >>>> >>>> MPI_Pack_external must convert data to big endian, so it can be dumped >>>> into a file, and be read correctly on big and little endianness arch, and >>>> with any MPI flavor. >>>> >>>> if you use only one MPI library on one arch, or if data is never >>>> read/written from/to a file, then it is more efficient to MPI_Pack. >>>> >>>> openmpi is optimized and the data is swapped only when needed. >>>> so if your cluster is little endian only, MPI_Send and MPI_Recv will never >>>> byte swap data internally. >>>> if both ends have different endianness, data is sent in big endian format >>>> and byte swapped when received only if needed. >>>> generally speaking, a send/recv requires zero or one byte swap. >>>> >>>> fwiw, we previously had a claim that debian nor Ubuntu have a maintainer >>>> for openmpi, which would explain why an obsolete version is shipped. I >>>> made a few researchs and could not find any evidence openmpi is no more >>>> maintained. >>>> >>>> Cheers, >>>> >>>> Gilles >>>> >>>> >>>> >>>> On Thursday, February 11, 2016, Michael Rezny <michael.re...@monash.edu> >>>> wrote: >>>> Hi Gilles, >>>> thanks for thinking about this in more detail. >>>> >>>> I understand what you are saying, but your comments raise some questions >>>> in my mind: >>>> >>>> If one is in a homogeneous cluster, is it important that, in the case of >>>> little-endian, that the data be >>>> converted to extern32 format (big-endian), only to be always converted at >>>> the receiving rank >>>> back to little-endian? >>>> >>>> This would seem to be inefficient, especially if the site has no need for >>>> external MPI access. >>>> >>>> So, does --enable-heterogeneous do more than put MPI routines using >>>> "extern32" into straight pass-through? >>>> >>>> Back in the old days of PVM, all messages were converted into network >>>> order. This had severe performance impacts >>>> on little-endian clusters. >>>> >>>> So much so that a clever way of getting around this was an implementation >>>> of "receiver makes right" in which >>>> all data was sent in the native format of the sending rank. The receiving >>>> rank analysed the message to determine if >>>> a conversion was necessary. In those days with Cray format data, it could >>>> be more complicated than just byte swapping. >>>> >>>> So in essence, how is a balance struck between supporting heterogenous >>>> architectures and maximum performance >>>> with codes where message passing performance is critical? >>>> >>>> As a follow up, since I am now at home, this same problem also exists with >>>> the Ubuntu 15.10 OpenMP packages >>>> which surprisingly are still at 1.6.5, same as 14.04. >>>> >>>> Again, downloading, building, and using the latest stable version of >>>> OpenMP solved the problem. >>>> >>>> kindest regards >>>> Mike >>>> >>>> >>>> On 11/02/2016, at 7:31 PM, Gilles Gouaillardet wrote: >>>> >>>>> Michael, >>>>> >>>>> I think it is worst than that ... >>>>> >>>>> without --enable-heterogeneous, it seems the data is not correctly packed >>>>> (e.g. it is not converted to big endian), at least on a x86_64 arch. >>>>> unpack looks broken too, but pack followed by unpack does work. >>>>> that means if you are reading data correctly written in external32e >>>>> format, >>>>> it will not be correctly unpacked. >>>>> >>>>> with --enable-heterogeneous, it is only half broken >>>>> (I do not know yet whether pack or unpack is broken ...) >>>>> and pack followed by unpack does not work. >>>>> >>>>> I will double check that tomorrow >>>>> >>>>> Cheers, >>>>> >>>>> Gilles >>>>> >>>>> On Thursday, February 11, 2016, Michael Rezny <michael.re...@monash.edu> >>>>> wrote: >>>>> Hi Ralph, >>>>> you are indeed correct. However, many of our users >>>>> have workstations such as me, with OpenMPI provided by installing a >>>>> package. >>>>> So we don't know what has been configured. >>>>> >>>>> Then we have failures, since, for instance, Ubuntu 14.04 by default >>>>> appears to have been built >>>>> with heterogeneous support! The other (working) machine is a large HPC, >>>>> and it seems OpenMPI was built >>>>> without heterogeneous support. >>>>> >>>>> Currently we work around the problem for packing and unpacking by having >>>>> a compiler switch >>>>> that will switch between calls to pack/unpack_external and pac/unpack. >>>>> >>>>> It is only now we started to track down what the problem actually is. >>>>> >>>>> kindest regards >>>>> Mike >>>>> >>>>> On 11 February 2016 at 15:54, Ralph Castain <r...@open-mpi.org> wrote: >>>>> Out of curiosity: if both systems are Intel, they why are you enabling >>>>> hetero? You don’t need it in that scenario. >>>>> >>>>> Admittedly, we do need to fix the bug - just trying to understand why you >>>>> are configuring that way. >>>>> >>>>> >>>>>> On Feb 10, 2016, at 8:46 PM, Michael Rezny <michael.re...@monash.edu> >>>>>> wrote: >>>>>> >>>>>> Hi Gilles, >>>>>> I can confirm that with a fresh download and build from source for >>>>>> OpenMPI 1.10.2 >>>>>> with --enable-heterogeneous >>>>>> the unpacked ints are the wrong endian. >>>>>> >>>>>> However, without --enable-heterogeneous, the unpacked ints are correct. >>>>>> >>>>>> So, this problem still exists in heterogeneous builds with OpenMPI >>>>>> version 1.10.2. >>>>>> >>>>>> kindest regards >>>>>> Mike >>>>>> >>>>>> On 11 February 2016 at 14:48, Gilles Gouaillardet >>>>>> <gilles.gouaillar...@gmail.com> wrote: >>>>>> Michael, >>>>>> >>>>>> does your two systems have the same endianness ? >>>>>> >>>>>> do you know how openmpi was configure'd on both systems ? >>>>>> (is --enable-heterogeneous enabled or disabled on both systems ?) >>>>>> >>>>>> fwiw, openmpi 1.6.5 is old now and no more maintained. >>>>>> I strongly encourage you to use openmpi 1.10.2 >>>>>> >>>>>> Cheers, >>>>>> >>>>>> Gilles >>>>>> >>>>>> On Thursday, February 11, 2016, Michael Rezny <michael.re...@monash.edu> >>>>>> wrote: >>>>>> Hi, >>>>>> I am running Ubuntu 14.04 LTS with OpenMPI 1.6.5 and gcc 4.8.4 >>>>>> >>>>>> On a single rank program which just packs and unpacks two ints using >>>>>> MPI_Pack_external and MPI_Unpack_external >>>>>> the unpacked ints are in the wrong endian order. >>>>>> >>>>>> However, on a HPC, (not Ubuntu), using OpenMPI 1.6.5 and gcc 4.8.4 the >>>>>> unpacked ints are correct. >>>>>> >>>>>> Is it possible to get some assistance to track down what is going on? >>>>>> >>>>>> Here is the output from the program: >>>>>> >>>>>> ~/tests/mpi/Pack test1 >>>>>> send data 000004d2 0000162e >>>>>> MPI_Pack_external: 0 >>>>>> buffer size: 8 >>>>>> MPI_unpack_external: 0 >>>>>> recv data d2040000 2e160000 >>>>>> >>>>>> And here is the source code: >>>>>> >>>>>> #include <stdio.h> >>>>>> #include <mpi.h> >>>>>> >>>>>> int main(int argc, char *argv[]) { >>>>>> int numRanks, myRank, error; >>>>>> >>>>>> int send_data[2] = {1234, 5678}; >>>>>> int recv_data[2]; >>>>>> >>>>>> MPI_Aint buffer_size = 1000; >>>>>> char buffer[buffer_size]; >>>>>> >>>>>> MPI_Init(&argc, &argv); >>>>>> MPI_Comm_size(MPI_COMM_WORLD, &numRanks); >>>>>> MPI_Comm_rank(MPI_COMM_WORLD, &myRank); >>>>>> >>>>>> printf("send data %08x %08x \n", send_data[0], send_data[1]); >>>>>> >>>>>> MPI_Aint position = 0; >>>>>> error = MPI_Pack_external("external32", (void*) send_data, 2, MPI_INT, >>>>>> buffer, buffer_size, &position); >>>>>> printf("MPI_Pack_external: %d\n", error); >>>>>> >>>>>> printf("buffer size: %d\n", (int) position); >>>>>> >>>>>> position = 0; >>>>>> error = MPI_Unpack_external("external32", buffer, buffer_size, >>>>>> &position, >>>>>> recv_data, 2, MPI_INT); >>>>>> printf("MPI_unpack_external: %d\n", error); >>>>>> >>>>>> printf("recv data %08x %08x \n", recv_data[0], recv_data[1]); >>>>>> >>>>>> MPI_Finalize(); >>>>>> >>>>>> return 0; >>>>>> } >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> devel mailing list >>>>>> de...@open-mpi.org >>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>>> Link to this post: >>>>>> http://www.open-mpi.org/community/lists/devel/2016/02/18573.php >>>>>> >>>>>> _______________________________________________ >>>>>> devel mailing list >>>>>> de...@open-mpi.org >>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>>> Link to this post: >>>>>> http://www.open-mpi.org/community/lists/devel/2016/02/18575.php >>>>> >>>>> >>>>> _______________________________________________ >>>>> devel mailing list >>>>> de...@open-mpi.org >>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>> Link to this post: >>>>> http://www.open-mpi.org/community/lists/devel/2016/02/18576.php >>>>> >>>>> _______________________________________________ >>>>> devel mailing list >>>>> de...@open-mpi.org >>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>> Link to this post: >>>>> http://www.open-mpi.org/community/lists/devel/2016/02/18579.php >>>> >>>> >>>> >>>> _______________________________________________ >>>> devel mailing list >>>> de...@open-mpi.org >>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>> Link to this post: >>>> http://www.open-mpi.org/community/lists/devel/2016/02/18582.php >>> >>> _______________________________________________ >>> devel mailing list >>> de...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> Link to this post: >>> http://www.open-mpi.org/community/lists/devel/2016/02/18591.php >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> Link to this post: >> http://www.open-mpi.org/community/lists/devel/2016/02/18593.php > > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2016/02/18595.php