On May 30, 2012, at 5:05 AM, Mike Dubman wrote:

>  Not good:

@#$%@#%@#!!  But I guess this is why we test.  :-(

> /labhome/alexm/workspace/openmpi-1.6.1a1hge06c2f2a0859/inst/bin/mpirun --host 
> h-qa-017,h-qa-017,h-qa-017,h-qa-017,h-qa-018,h-qa-018,h-qa-018,h-qa-018 -np 8 
> --bind-to-core -bynode -display-map 
> /usr/mpi/gcc/mlnx-openmpi-1.6rc4/tests/osu_benchmarks-3.1.1/osu_alltoall
>  
>  ========================   JOB MAP   ========================
>  
>  Data for node: h-qa-017               Num procs: 4
>                 Process OMPI jobid: [36855,1] Process rank: 0
>                 Process OMPI jobid: [36855,1] Process rank: 2
>                 Process OMPI jobid: [36855,1] Process rank: 4
>                 Process OMPI jobid: [36855,1] Process rank: 6
>  
>  Data for node: h-qa-018               Num procs: 4
>                 Process OMPI jobid: [36855,1] Process rank: 1
>                 Process OMPI jobid: [36855,1] Process rank: 3
>                 Process OMPI jobid: [36855,1] Process rank: 5
>                 Process OMPI jobid: [36855,1] Process rank: 7
>  
>  =============================================================
> --------------------------------------------------------------------------
> An invalid physical processor ID was returned when attempting to bind
> an MPI process to a unique processor.
> [snip] 
> $hwloc-ls --of console
> Machine (32GB)
>   NUMANode L#0 (P#0 16GB) + Socket L#0 + L3 L#0 (20MB) + L2 L#0 (256KB) + L1 
> L#0 (32KB) + Core L#0
>     PU L#0 (P#0)
>     PU L#1 (P#2)
>   NUMANode L#1 (P#1 16GB) + Socket L#1 + L3 L#1 (20MB) + L2 L#1 (256KB) + L1 
> L#1 (32KB) + Core L#1
>     PU L#2 (P#1)
>     PU L#3 (P#3)

Is this hwloc output exactly the same on both nodes?

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/


Reply via email to