On May 30, 2012, at 5:05 AM, Mike Dubman wrote: > Not good:
@#$%@#%@#!! But I guess this is why we test. :-( > /labhome/alexm/workspace/openmpi-1.6.1a1hge06c2f2a0859/inst/bin/mpirun --host > h-qa-017,h-qa-017,h-qa-017,h-qa-017,h-qa-018,h-qa-018,h-qa-018,h-qa-018 -np 8 > --bind-to-core -bynode -display-map > /usr/mpi/gcc/mlnx-openmpi-1.6rc4/tests/osu_benchmarks-3.1.1/osu_alltoall > > ======================== JOB MAP ======================== > > Data for node: h-qa-017 Num procs: 4 > Process OMPI jobid: [36855,1] Process rank: 0 > Process OMPI jobid: [36855,1] Process rank: 2 > Process OMPI jobid: [36855,1] Process rank: 4 > Process OMPI jobid: [36855,1] Process rank: 6 > > Data for node: h-qa-018 Num procs: 4 > Process OMPI jobid: [36855,1] Process rank: 1 > Process OMPI jobid: [36855,1] Process rank: 3 > Process OMPI jobid: [36855,1] Process rank: 5 > Process OMPI jobid: [36855,1] Process rank: 7 > > ============================================================= > -------------------------------------------------------------------------- > An invalid physical processor ID was returned when attempting to bind > an MPI process to a unique processor. > [snip] > $hwloc-ls --of console > Machine (32GB) > NUMANode L#0 (P#0 16GB) + Socket L#0 + L3 L#0 (20MB) + L2 L#0 (256KB) + L1 > L#0 (32KB) + Core L#0 > PU L#0 (P#0) > PU L#1 (P#2) > NUMANode L#1 (P#1 16GB) + Socket L#1 + L3 L#1 (20MB) + L2 L#1 (256KB) + L1 > L#1 (32KB) + Core L#1 > PU L#2 (P#1) > PU L#3 (P#3) Is this hwloc output exactly the same on both nodes? -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/