Hi Gilles,
thanks for the detailed explanation.

Have a nice weekend
Mike

On 12/02/2016, at 11:23 PM, Gilles Gouaillardet wrote:

> Michael,
> 
> Per the specifications, MPI_Pack_external and MPI_Unpack_external must 
> pack/unpack to/from big endian, regardless the endianness of the host.
> On a little endian system, byte swapping must occur because this is what you 
> are explicitly requesting.
> These functions are really meant to be used in order to write a buffer to a 
> file, so it can be read on an other arch, and potentially with an other MPI 
> library (see the man page)
> 
> Today, this is not the case and these are two bugs.
> 1. with --enable-heterogeneous, MPI_Pack_external does not do any byte 
> swapping on little endian arch, so your test fails.
> 2. without --enable-heterogeneous, nor MPI_Pack_external nor 
> MPI_Unpack_external does any byte swapping. Even if your test is working 
> fine, keep in mind the buffer is not in big endian format, and should not be 
> dumped into a file if you plan to read it later with a bug free 
> MPI_Unpack_external.
> 
> Once the bugs are fixed,
> If you want to run on a heterogeneous cluster, you have to
> - configure with --enable-heterogeneous
> - use MPI_Pack_external and MPI_unpack_external if you want to pack a 
> message, send it to an other host with type MPI_PACKED, and unpack it there.
> - not use MPI_Pack/MPI_Unpack to send/recv messages between hosts with 
> different endianness.
> 
> If you are only transferring predefined and derived datatypes, you have 
> nothing to do,
> Openmpi will automatically swap bytes on the receiver side if needed.
> 
> If you want to run on a homogeneous system, you do not need 
> --enable-heterogeneous, and you can use MPI_Pack/MPI_Unpack, that is more 
> efficient than MPI_Pack_external/MPI_Unpack_external to send/recv messages.
> 
> 
> 
> For the time being, you are not able to write portable data with 
> MPI_Pack_external.
> The easiest way is to run on a homogeneous cluster, configure openmpi without 
> --enable-heterogeneous and without --enable-debug, so pack/unpack will work 
> regardless you use the external or the non external subroutines.
> Generally speaking, I recommend you use derived datatypes instead of manually 
> packing/unpacking data to/from buffers.
> 
> Cheers,
> 
> Gilles
> 
> On Friday, February 12, 2016, Michael Rezny <michael.re...@monash.edu> wrote:
> Hi Gilles,
> I am misunderstanding something here. What you are now saying seems, to me, 
> to be at odds with what you said previously.
> 
> Assume the situation where both sender and receiver are little-endian, and 
> discussing only MPI_Pack_external, and MPI_Unpack_external
> 
> Consider case 1 --enable-heterogeneous:
> In your previous email I understood that "receiver make right" was being 
> implemented
> So, sender does not byte-swap, and message is sent in (native) little-endian 
> format.
> Receiver recognises the received message is in little-endian format and since 
> this is also its native format, no byte swap is needed.
> 
> Consider case 2 --disable-heterogeneous
> It seems strange, that, in this case, any byte swapping would ever need to 
> occur.
> One is assuming a homogeneous system and sender and receiver will always be 
> using their native format.
> i.e, exactly the same as MPI_Pack and MPI_Unpack.
> 
> kindest regards
> Mike
> 
> On 12/02/2016, at 9:25 PM, Gilles Gouaillardet wrote:
> 
>> Michael,
>> 
>> byte swapping only occurs if you invoke MPI_Pack_external and 
>> MPI_Unpack_external on little endianness systems.
>> 
>> MPI_Pack and MPI_Unpack uses the same engine that MPI_Send and MPI_Recv and 
>> this does not involve any byte swapping if both ends have the same 
>> endianness.
>> 
>> Cheers,
>> 
>> Gilles
>> 
>> On Friday, February 12, 2016, Michael Rezny <michael.re...@monash.edu> wrote:
>> Hi,
>> oh, that is good news! The process is meant to be implementing "receiver 
>> makes right" which is good news for efficiency.
>> 
>> But, in the second case, without --enable-heterogeneous, are you saying that 
>> on little-endian machines, byte swapping
>> is meant to always occur? That seems most odd. I would have thought that if 
>> one only wants to work and then to configure
>> OpenMPI for this mode, then there is no need to check at the receiving end 
>> whether byte-swapping is needed or not. It will be assumed
>> that both sender and receiver are agreed on the format, whatever it is. On a 
>> homogeneous little-endian HPC cluster one would not want
>> the extra overhead of two conversions for every packed message.
>> 
>> Is it possible that the assert has been implemented incorrectly in this case?
>> 
>> There is absolutely no urgency with regard to a fix. Thanks to your quick 
>> response, we now understand what is causing
>> the problem and are in the process of implementing a test in ./configure to 
>> determine if the bug is present, and if so,
>> add a compiler flag to switch to using MPI_Pack and MPI_Unpack.
>> 
>> It would be good if you would be kind enough to let me know when a fix is 
>> available and I will download, build,
>> and test it on our application. Then this version can be installed as the 
>> default.
>> 
>> Once again, many thanks for your prompt and most helpful responses.
>> 
>> warmest regards
>> MIke
>> 
>> On 12/02/2016, at 7:03 PM, Gilles Gouaillardet wrote:
>> 
>>> Michael,
>>> 
>>> i'd like to correct what i wrote earlier
>>> 
>>> in heterogeneous clusters, data is sent "as is" (e.g. no byte swapping) and 
>>> it is byte swapped when received and only if needed.
>>> 
>>> with --enable-heterogeneous, MPI_Unpack_external is working, but 
>>> MPI_Pack_external is broken
>>> (e.g. no byte swapping occurs on little endian arch) since we internall use 
>>> the similar mechanism used to send data. that is a bug and i will work on 
>>> that.
>>> 
>>> without --enable-heterogeneous, MPI_Pack_external nor MPI_Unpack_external 
>>> do any byte swapping and they
>>> are both broken. fwiw, it you configure'd with --enable-debug, you would 
>>> have ran into an assert error (e.g. crash).
>>> 
>>> i will work on a fix, but it might take some time before it is ready
>>> 
>>> Cheers,
>>> 
>>> Gilles
>>> On 2/11/2016 6:16 PM, Gilles Gouaillardet wrote:
>>>> Michael,
>>>> 
>>>> MPI_Pack_external must convert data to big endian, so it can be dumped 
>>>> into a file, and be read correctly on big and little endianness arch, and 
>>>> with any MPI flavor.
>>>> 
>>>> if you use only one MPI library on one arch, or if data is never 
>>>> read/written from/to a file, then it is more efficient to MPI_Pack.
>>>> 
>>>> openmpi is optimized and the data is swapped only when needed.
>>>> so if your cluster is little endian only, MPI_Send and MPI_Recv will never 
>>>> byte swap data internally.
>>>> if both ends have different endianness, data is sent in big endian format 
>>>> and byte swapped when received only if needed.
>>>> generally speaking, a send/recv requires zero or one byte swap.
>>>> 
>>>> fwiw, we previously had a claim that debian nor Ubuntu have a maintainer 
>>>> for openmpi, which would explain why an obsolete version is shipped. I 
>>>> made a few researchs and could not find any evidence openmpi is no more 
>>>> maintained.
>>>> 
>>>> Cheers,
>>>> 
>>>> Gilles
>>>> 
>>>> 
>>>> 
>>>> On Thursday, February 11, 2016, Michael Rezny <michael.re...@monash.edu> 
>>>> wrote:
>>>> Hi Gilles,
>>>> thanks for thinking about this in more detail.
>>>> 
>>>> I understand what you are saying, but your comments raise some questions 
>>>> in my mind:
>>>> 
>>>> If one is in a homogeneous cluster, is it important that, in the case of 
>>>> little-endian, that the data be
>>>> converted to extern32 format (big-endian), only to be always converted at 
>>>> the receiving rank
>>>> back to little-endian?
>>>> 
>>>> This would seem to be inefficient, especially if the site has no need for 
>>>> external MPI access.
>>>> 
>>>> So, does --enable-heterogeneous do more than put MPI routines using 
>>>> "extern32" into straight pass-through?
>>>> 
>>>> Back in the old days of PVM, all messages were converted into network 
>>>> order. This had severe performance impacts
>>>> on little-endian clusters.
>>>> 
>>>> So much so that a clever way of getting around this was an implementation 
>>>> of "receiver makes right" in which
>>>> all data was sent in the native format of the sending rank. The receiving 
>>>> rank analysed the message to determine if
>>>> a conversion was necessary. In those days with Cray format data, it could 
>>>> be more complicated than just byte swapping.
>>>> 
>>>> So in essence, how is a balance struck between supporting heterogenous 
>>>> architectures and maximum performance
>>>> with codes where message passing performance is critical?
>>>> 
>>>> As a follow up, since I am now at home, this same problem also exists with 
>>>> the Ubuntu 15.10 OpenMP packages
>>>> which surprisingly are still at 1.6.5, same as 14.04.
>>>> 
>>>> Again, downloading, building, and using the latest stable version of 
>>>> OpenMP solved the problem.
>>>> 
>>>> kindest regards
>>>> Mike
>>>> 
>>>> 
>>>> On 11/02/2016, at 7:31 PM, Gilles Gouaillardet wrote:
>>>> 
>>>>> Michael,
>>>>> 
>>>>> I think it is worst than that ...
>>>>> 
>>>>> without --enable-heterogeneous, it seems the data is not correctly packed
>>>>> (e.g. it is not converted to big endian), at least on a x86_64 arch.
>>>>> unpack looks broken too, but pack followed by unpack does work.
>>>>> that means if you are reading data correctly written in external32e 
>>>>> format,
>>>>> it will not be correctly unpacked.
>>>>> 
>>>>> with --enable-heterogeneous, it is only half broken
>>>>> (I do not know yet whether pack or unpack is broken ...)
>>>>> and pack followed by unpack does not work.
>>>>> 
>>>>> I will double check that tomorrow
>>>>> 
>>>>> Cheers,
>>>>> 
>>>>> Gilles
>>>>> 
>>>>> On Thursday, February 11, 2016, Michael Rezny <michael.re...@monash.edu> 
>>>>> wrote:
>>>>> Hi Ralph,
>>>>> you are indeed correct. However, many of our users
>>>>> have workstations such as me, with OpenMPI provided by installing a 
>>>>> package.
>>>>> So we don't know what has been configured.
>>>>> 
>>>>> Then we have failures, since, for instance, Ubuntu 14.04 by default 
>>>>> appears to have been built
>>>>> with heterogeneous support! The other (working) machine is a large HPC, 
>>>>> and it seems OpenMPI was built
>>>>> without heterogeneous support.
>>>>> 
>>>>> Currently we work around the problem for packing and unpacking by having 
>>>>> a compiler switch
>>>>> that will switch between calls to pack/unpack_external and pac/unpack.
>>>>> 
>>>>> It is only now we started to track down what the problem actually is.
>>>>> 
>>>>> kindest regards
>>>>> Mike
>>>>> 
>>>>> On 11 February 2016 at 15:54, Ralph Castain <r...@open-mpi.org> wrote:
>>>>> Out of curiosity: if both systems are Intel, they why are you enabling 
>>>>> hetero? You don’t need it in that scenario.
>>>>> 
>>>>> Admittedly, we do need to fix the bug - just trying to understand why you 
>>>>> are configuring that way.
>>>>> 
>>>>> 
>>>>>> On Feb 10, 2016, at 8:46 PM, Michael Rezny <michael.re...@monash.edu> 
>>>>>> wrote:
>>>>>> 
>>>>>> Hi Gilles,
>>>>>> I can confirm that with a fresh download and build from source for 
>>>>>> OpenMPI 1.10.2
>>>>>> with --enable-heterogeneous
>>>>>> the unpacked ints are the wrong endian.
>>>>>> 
>>>>>> However, without --enable-heterogeneous, the unpacked ints are correct.
>>>>>> 
>>>>>> So, this problem still exists in heterogeneous builds with OpenMPI 
>>>>>> version 1.10.2.
>>>>>> 
>>>>>> kindest regards
>>>>>> Mike
>>>>>> 
>>>>>> On 11 February 2016 at 14:48, Gilles Gouaillardet 
>>>>>> <gilles.gouaillar...@gmail.com> wrote:
>>>>>> Michael,
>>>>>> 
>>>>>> does your two systems have the same endianness ?
>>>>>> 
>>>>>> do you know how openmpi was configure'd on both systems ?
>>>>>> (is --enable-heterogeneous enabled or disabled on both systems ?)
>>>>>> 
>>>>>> fwiw, openmpi 1.6.5 is old now and no more maintained.
>>>>>> I strongly encourage you to use openmpi 1.10.2
>>>>>> 
>>>>>> Cheers,
>>>>>> 
>>>>>> Gilles
>>>>>> 
>>>>>> On Thursday, February 11, 2016, Michael Rezny <michael.re...@monash.edu> 
>>>>>> wrote:
>>>>>> Hi,
>>>>>> I am running Ubuntu 14.04 LTS with OpenMPI 1.6.5 and gcc 4.8.4
>>>>>> 
>>>>>> On a single rank program which just packs and unpacks two ints using 
>>>>>> MPI_Pack_external and MPI_Unpack_external
>>>>>> the unpacked ints are in the wrong endian order.
>>>>>> 
>>>>>> However, on a HPC, (not Ubuntu), using OpenMPI 1.6.5 and gcc 4.8.4 the 
>>>>>> unpacked ints are correct.
>>>>>> 
>>>>>> Is it possible to get some assistance to track down what is going on?
>>>>>> 
>>>>>> Here is the output from the program:
>>>>>> 
>>>>>>  ~/tests/mpi/Pack test1
>>>>>> send data 000004d2 0000162e 
>>>>>> MPI_Pack_external: 0
>>>>>> buffer size: 8
>>>>>> MPI_unpack_external: 0
>>>>>> recv data d2040000 2e160000 
>>>>>> 
>>>>>> And here is the source code:
>>>>>> 
>>>>>> #include <stdio.h>
>>>>>> #include <mpi.h>
>>>>>> 
>>>>>> int main(int argc, char *argv[]) {
>>>>>>   int numRanks, myRank, error;
>>>>>> 
>>>>>>   int send_data[2] = {1234, 5678};
>>>>>>   int recv_data[2];
>>>>>> 
>>>>>>   MPI_Aint buffer_size = 1000;
>>>>>>   char buffer[buffer_size];
>>>>>> 
>>>>>>   MPI_Init(&argc, &argv);
>>>>>>   MPI_Comm_size(MPI_COMM_WORLD, &numRanks);
>>>>>>   MPI_Comm_rank(MPI_COMM_WORLD, &myRank);
>>>>>> 
>>>>>>   printf("send data %08x %08x \n", send_data[0], send_data[1]);
>>>>>> 
>>>>>>   MPI_Aint position = 0;
>>>>>>   error = MPI_Pack_external("external32", (void*) send_data, 2, MPI_INT,
>>>>>>           buffer, buffer_size, &position);
>>>>>>   printf("MPI_Pack_external: %d\n", error);
>>>>>> 
>>>>>>   printf("buffer size: %d\n", (int) position);
>>>>>> 
>>>>>>   position = 0;
>>>>>>   error = MPI_Unpack_external("external32", buffer, buffer_size, 
>>>>>> &position,
>>>>>>           recv_data, 2, MPI_INT);
>>>>>>   printf("MPI_unpack_external: %d\n", error);
>>>>>> 
>>>>>>   printf("recv data %08x %08x \n", recv_data[0], recv_data[1]);
>>>>>> 
>>>>>>   MPI_Finalize();
>>>>>> 
>>>>>>   return 0;
>>>>>> }
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> _______________________________________________
>>>>>> devel mailing list
>>>>>> de...@open-mpi.org
>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>> Link to this post: 
>>>>>> http://www.open-mpi.org/community/lists/devel/2016/02/18573.php
>>>>>> 
>>>>>> _______________________________________________
>>>>>> devel mailing list
>>>>>> de...@open-mpi.org
>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>> Link to this post: 
>>>>>> http://www.open-mpi.org/community/lists/devel/2016/02/18575.php
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> de...@open-mpi.org
>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>> Link to this post: 
>>>>> http://www.open-mpi.org/community/lists/devel/2016/02/18576.php
>>>>> 
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> de...@open-mpi.org
>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>> Link to this post: 
>>>>> http://www.open-mpi.org/community/lists/devel/2016/02/18579.php
>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> devel mailing list
>>>> de...@open-mpi.org
>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>> Link to this post: 
>>>> http://www.open-mpi.org/community/lists/devel/2016/02/18582.php
>>> 
>>> _______________________________________________
>>> devel mailing list
>>> de...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> Link to this post: 
>>> http://www.open-mpi.org/community/lists/devel/2016/02/18591.php
>> 
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2016/02/18593.php
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2016/02/18595.php

Reply via email to