It is the default openmpi that comes with Ubuntu 14.04.
> On 08 Dec 2014, at 17:17, Ralph Castain <r...@open-mpi.org> wrote:
>
> Pim: is this an OMPI you built, or one you were given somehow? If you built
> it, how did you configure it?
>
>> On Dec 8, 2014, at 8:12 AM, Brice Goglin <brice.gog...@inria.fr> wrote:
>>
>> It likely depends on how SLURM allocates the cpuset/cgroup inside the
>> nodes. The XML warning is related to these restrictions inside the node.
>> Anyway, my feeling is that there's a old OMPI or a old hwloc somewhere.
>>
>> How do we check after install whether OMPI uses the embedded or the
>> system-wide hwloc?
>>
>> Brice
>>
>>
>>
>>
>> Le 08/12/2014 17:07, Pim Schellart a écrit :
>>> Dear Ralph,
>>>
>>> the nodes are called coma## and as you can see in the logs the nodes of the
>>> broken example are the same as the nodes of the working one, so that
>>> doesn’t seem to be the cause. Unless (very likely) I’m missing something.
>>> Anything else I can check?
>>>
>>> Regards,
>>>
>>> Pim
>>>
>>>> On 08 Dec 2014, at 17:03, Ralph Castain <r...@open-mpi.org> wrote:
>>>>
>>>> As Brice said, OMPI has its own embedded version of hwloc that we use, so
>>>> there is no Slurm interaction to be considered. The most likely cause is
>>>> that one or more of your nodes is picking up a different version of OMPI.
>>>> So things “work” if you happen to get nodes where all the versions match,
>>>> and “fail” when you get a combination that includes a different version.
>>>>
>>>> Is there some way you can narrow down your search to find the node(s) that
>>>> are picking up the different version?
>>>>
>>>>
>>>>> On Dec 8, 2014, at 7:48 AM, Pim Schellart <p.schell...@gmail.com> wrote:
>>>>>
>>>>> Dear Brice,
>>>>>
>>>>> I am not sure why this is happening since all code seems to be using the
>>>>> same hwloc library version (1.8) but it does :) An MPI program is started
>>>>> through SLURM on two nodes with four CPU cores total (divided over the
>>>>> nodes) using the following script:
>>>>>
>>>>> #! /bin/bash
>>>>> #SBATCH -N 2 -n 4
>>>>> /usr/bin/mpiexec /usr/bin/lstopo --version
>>>>> /usr/bin/mpiexec /usr/bin/lstopo --of xml
>>>>> /usr/bin/mpiexec /path/to/my_mpi_code
>>>>>
>>>>> When this is submitted multiple times it gives “out-of-order” warnings in
>>>>> about 9/10 cases but works without warnings in 1/10 cases. I attached the
>>>>> output (with xml) for both the working and `broken` case. Note that the
>>>>> xml is of course printed (differently) multiple times for each task/core.
>>>>> As always, any help would be appreciated.
>>>>>
>>>>> Regards,
>>>>>
>>>>> Pim Schellart
>>>>>
>>>>> P.S. $ mpirun --version
>>>>> mpirun (Open MPI) 1.6.5
>>>>>
>>>>> <broken.log><working.log>
>>>>>
>>>>>> On 07 Dec 2014, at 13:50, Brice Goglin <brice.gog...@inria.fr> wrote:
>>>>>>
>>>>>> Hello
>>>>>> The github issue you're refering to was closed 18 months ago. The
>>>>>> warning (it's not an error) is only supposed to appear if you're
>>>>>> importing in a recent hwloc a XML that was exported from a old hwloc. I
>>>>>> don't see how that could happen when using Open MPI since the hwloc
>>>>>> versions on both sides is the same.
>>>>>> Make sure you're not confusing with another error described here
>>>>>>
>>>>>> http://www.open-mpi.org/projects/hwloc/doc/v1.10.0/a00028.php#faq_os_error
>>>>>> Otherwise please report the exact Open MPI and/or hwloc versions as well
>>>>>> as the XML lstopo output on the nodes that raise the warning (lstopo
>>>>>> foo.xml). Send these to hwloc mailing lists such as
>>>>>> hwloc-us...@open-mpi.org or hwloc-de...@open-mpi.org
>>>>>> Thanks
>>>>>> Brice
>>>>>>
>>>>>>
>>>>>> Le 07/12/2014 13:29, Pim Schellart a écrit :
>>>>>>> Dear OpenMPI developers,
>>>>>>>
>>>>>>> this might be a bit off topic but when using the SLURM scheduler (with
>>>>>>> cpuset support) on Ubuntu 14.04 (openmpi 1.6) hwloc sometimes gives a
>>>>>>> "out-of-order topology discovery” error. According to issue #103 on
>>>>>>> github (https://github.com/open-mpi/hwloc/issues/103) this error was
>>>>>>> discussed before and it was possible to sort it out in
>>>>>>> “insert_object_by_parent”, is this still considered? If not, what (top
>>>>>>> level) hwloc API call should we look for in the SLURM sources to start
>>>>>>> debugging? Any help will be most welcome.
>>>>>>>
>>>>>>> Kind regards,
>>>>>>>
>>>>>>> Pim Schellart
>>>>>>> _______________________________________________
>>>>>>> devel mailing list
>>>>>>> de...@open-mpi.org
>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>> Link to this post:
>>>>>>> http://www.open-mpi.org/community/lists/devel/2014/12/16441.php
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> de...@open-mpi.org
>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>> Link to this post:
>>>>> http://www.open-mpi.org/community/lists/devel/2014/12/16447.php
>>>> _______________________________________________
>>>> devel mailing list
>>>> de...@open-mpi.org
>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>> Link to this post:
>>>> http://www.open-mpi.org/community/lists/devel/2014/12/16448.php
>>> _______________________________________________
>>> devel mailing list
>>> de...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> Link to this post:
>>> http://www.open-mpi.org/community/lists/devel/2014/12/16449.php
>>
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2014/12/16450.php
>
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/12/16451.php