With HWLOC_COMPONENTS=no_os, MPICH is now working fine but all tests now fail with Open-MPI (see below). I know how to resolve this, but am noting it for the benefit of others.
-------------------------------------------------------------------------- All nodes which are allocated for this job are already filled. -------------------------------------------------------------------------- Jeff On Thu, Sep 13, 2018 at 10:36 PM, Brice Goglin <brice.gog...@inria.fr> wrote: > If lstopo fails there, run "hwloc-gather-topology foo" and send foo.tar.bz2 > > As a workaround for ARMCI, you may try setting HWLOC_COMPONENTS=no_os,stop > in the environment so that hwloc behaves as if the operating system had no > topology support. > > Brice > > > > Le 14/09/2018 à 06:11, Jeff Hammond a écrit : > > All of the job failures have this warning so I am inclined to think they > are related. I don't know what I should expect from lstopo on inside of > AWS, but I guess I'll try it. > > I was using the hwloc shipped with the mpich-3.3b1. Talk to the MPICH > team if you want them to upgrade :-) > > Jeff > > On Thu, Sep 13, 2018 at 8:42 AM, Brice Goglin <brice.gog...@inria.fr> > wrote: > >> This is actually just a warning. Usually it causes the topology to be >> wrong (like a missing object), but it shouldn't prevent the program from >> working. Are you sure your programs are failing because of hwloc? Do you >> have a way to run lstopo on that node? >> >> By the way, you shouldn't use hwloc 2.0.0rc2, at least because it's old, >> it has a broken ABI, and it's a RC :) >> >> Brice >> >> >> >> Le 13/09/2018 à 16:12, Jeff Hammond a écrit : >> >> I am running ARMCI-MPI over MPICH in a Travis CI Linux instance and >> topology is causing it to fail. I do not care about topology in a >> virtualized environment. How do I fix this? >> >> ************************************************************ >> **************** >> * hwloc 2.0.0rc2-git has encountered what looks like an error from the >> operating system. >> * >> * Group0 (cpuset 0x00001111,0x11111111) intersects with L3 (cpuset >> 0x00001000,0x02100002) without inclusion! >> * Error occurred in topology.c line 1384 >> * >> * The following FAQ entry in the hwloc documentation may help: >> * What should I do when hwloc reports "operating system" warnings? >> * Otherwise please report this error message to the hwloc user's mailing >> list >> * along with the files generated by the hwloc-gather-topology script. >> ************************************************************ >> **************** >> >> https://travis-ci.org/jeffhammond/armci-mpi/jobs/425342479 has all of >> the details. >> >> Jeff >> >> >> -- >> Jeff Hammond >> jeff.scie...@gmail.com >> http://jeffhammond.github.io/ >> >> >> _______________________________________________ >> hwloc-users mailing >> listhwloc-us...@lists.open-mpi.orghttps://lists.open-mpi.org/mailman/listinfo/hwloc-users >> >> >> >> _______________________________________________ >> hwloc-users mailing list >> hwloc-users@lists.open-mpi.org >> https://lists.open-mpi.org/mailman/listinfo/hwloc-users >> > > > > -- > Jeff Hammond > jeff.scie...@gmail.com > http://jeffhammond.github.io/ > > > _______________________________________________ > hwloc-users mailing > listhwloc-us...@lists.open-mpi.orghttps://lists.open-mpi.org/mailman/listinfo/hwloc-users > > > > _______________________________________________ > hwloc-users mailing list > hwloc-users@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/hwloc-users > -- Jeff Hammond jeff.scie...@gmail.com http://jeffhammond.github.io/
_______________________________________________ hwloc-users mailing list hwloc-users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/hwloc-users