On Jun 19, 2013, at 7:52 AM, Paul Kapinos <kapi...@rz.rwth-aachen.de> wrote:

> Hello All,
> 
> I.
> Using the new Open MPI 1.7.1 we see some messages on the console:
> 
> > example mpiext init
> > example mpiext fini
> 
> ... on each call to MPI_INIT, MPI_FINALIZE at least in Fortran programs.
> 
> Seems somebody forgot to disable some 'printf'-debug-output? =)

This is actually from the mpiext example plugin, not from the Fortran code in 
OMPI.  It's example code, so it has printf's in it.  I'm a little surprised to 
see that output, though -- I wonder if it's somehow getting enabled when it 
shouldn't be...?

How did you configure/compile Open MPI?

> II.
> In the 1.7.x series, the 'carto' framework has been deleted:
> http://www.open-mpi.org/community/lists/announce/2013/04/0053.php
> > - Removed maffinity, paffinity, and carto frameworks (and associated
> >   MCA params).
> 
> Is there some replacement for this? Or, would Open MPI detect the NUMA 
> structure of nodes automatically?

Yes.  OMPI uses hwloc internally now to figure this stuff out.

> Background: Currently we're using the 'carto' framework on our kinda special 
> 'Bull BCS' nodes. Each such node consist of 4 boards with own IB card but 
> build a shared memory system. Clearly, communicating should go over the 
> nearest IB interface - for this we use 'carto' now.

It should do this automatically in the 1.7 series.

Hmm; I see there isn't any verbose output about which devices it picks, though. 
:-(  Try this patch, and run with --mca btl_base_verbose 100 and see if you see 
appropriate devices being mapped to appropriate processes:

Index: mca/btl/openib/btl_openib_component.c
===================================================================
--- mca/btl/openib/btl_openib_component.c       (revision 28652)
+++ mca/btl/openib/btl_openib_component.c       (working copy)
@@ -2712,6 +2712,8 @@
                 mca_btl_openib_component.ib_num_btls <
                 mca_btl_openib_component.ib_max_btls); i++) {
         if (distance != dev_sorted[i].distance) {
+            BTL_VERBOSE(("openib: skipping device %s; it's too far away", 
+                         ibv_get_device_name(dev_sorted[i].ib_dev)));
             break;
         }



-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/


Reply via email to