Hi Ralph, Thanks for your reply!
> One thing you might want to try: add this to your mpirun cmd line: > > --display-allocation > > This will tell you how many slots we think we've been given on your > cluster. I tried that using 1.8.2rc4, this is what I get: ====================== ALLOCATED NODES ====================== node2: slots=48 max_slots=48 slots_inuse=0 state=UNKNOWN ================================================================= I forgot to mention previously that mpirun runs all cores on localhost, it is only when running on another host (--hostfile hosts) that the 32 proc cap is observed. I'm attaching a snapshot of the most recent run. The job was invoked by: /usr/local/openmpi-1.8.2rc4/bin/mpirun -np 48 --hostfile hosts --display-allocation ./test.py > test.std 2> test.ste test.ste contains the hwloc error I mentioned in my previous post: **************************************************************************** * hwloc has encountered what looks like an error from the operating system. * * object (L3 cpuset 0x000003f0) intersection without inclusion! * Error occurred in topology.c line 760 * * Please report this error message to the hwloc user's mailing list, * along with the output from the hwloc-gather-topology.sh script. **************************************************************************** Hope this helps, Andrej > On Aug 21, 2014, at 12:50 PM, Ralph Castain <r...@open-mpi.org> wrote: > > > Starting early in the 1.7 series, we began to bind procs by default > > to cores when -np <= 2, and to sockets if np > 2. Is it possible > > this is what you are seeing? > > > > > > On Aug 21, 2014, at 12:45 PM, Andrej Prsa <aprs...@gmail.com> wrote: > > > >> Dear devels, > >> > >> I have been trying out 1.8.2rcs recently and found a show-stopping > >> problem on our cluster. Running any job with any number of > >> processors larger than 32 will always employ only 32 cores per > >> node (our nodes have 48 cores). We are seeing identical behavior > >> with 1.8.2rc4, 1.8.2rc2, and 1.8.1. Running identical programs > >> shows no such issues with version 1.6.5, where all 48 cores per > >> node are working. While our system is running torque/maui, the > >> problem is evident by running mpirun directly. > >> > >> I am attaching hwloc topology in case that helps -- I am aware of a > >> buggy bios code that trips hwloc, but I don't know if that might > >> be an issue or not. I am happy to help debugging if you can > >> provide me with guidance. > >> > >> Thanks, > >> Andrej > >> <cluster.output><cluster.tar.bz2>_______________________________________________ > >> devel mailing list > >> de...@open-mpi.org > >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > >> Link to this post: > >> http://www.open-mpi.org/community/lists/devel/2014/08/15676.php > > > > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/08/15678.php