Le 12/10/2011 22:56, Jeff Squyres a écrit : > One of the OMPI devs found a problem when I upgraded the OMPI SVN trunk to > the hwloc 1.2.2ompi version last week that I think I am just now beginning to > understand. > > Brief reminder of our strategy: > > - on each compute node, OMPI launches a local "orted" helper daemon > - this orted fork/exec's the local MPI processes > > To avoid the penalty of each MPI process invoking hwloc discovery > more-or-less simultaneously upon startup (which, as we've see on this list > before, can be painful when core counts are large), we have the orted do the > hwloc discovery, serialize this into XML, and send it to each of its local > processes. The local processes receive this XML and then load it into hwloc > and run from there. > > However, it looks like the resulting loaded-from-XML topology->is_thissystem > is set to 0, and therefore functions like hwloc_get_cpubind() actually get > wired up to dontget_thisproc_cpubind() (instead of the proper Linux backend, > for example). > > How do we avoid this? We need working hwloc functions after loading up an > XML topology string.
export HWLOC_THISSYSTEM=1 or hwloc_topology_set_flags(HWLOC_TOPOLOGY_FLAG_IS_THISSYSTEM) between init() and load() Brice