Le 12/10/2011 22:56, Jeff Squyres a écrit :
> One of the OMPI devs found a problem when I upgraded the OMPI SVN trunk to 
> the hwloc 1.2.2ompi version last week that I think I am just now beginning to 
> understand.
>
> Brief reminder of our strategy:
>
> - on each compute node, OMPI launches a local "orted" helper daemon
> - this orted fork/exec's the local MPI processes
>
> To avoid the penalty of each MPI process invoking hwloc discovery 
> more-or-less simultaneously upon startup (which, as we've see on this list 
> before, can be painful when core counts are large), we have the orted do the 
> hwloc discovery, serialize this into XML, and send it to each of its local 
> processes.  The local processes receive this XML and then load it into hwloc 
> and run from there.
>
> However, it looks like the resulting loaded-from-XML topology->is_thissystem 
> is set to 0, and therefore functions like hwloc_get_cpubind() actually get 
> wired up to dontget_thisproc_cpubind() (instead of the proper Linux backend, 
> for example).
>
> How do we avoid this?  We need working hwloc functions after loading up an 
> XML topology string.

export HWLOC_THISSYSTEM=1
or
hwloc_topology_set_flags(HWLOC_TOPOLOGY_FLAG_IS_THISSYSTEM) between
init() and load()

Brice

Reply via email to