I unfortunately don't have many cycles to think about this before Oct 1, but 
I'm still a little concerned about the portability aspects of having hwloc be a 
first class citizen of OMPI - if we support a platform hwloc doesn't, that 
seems like it will still cause problems...

Brian

On Sep 22, 2010, at 7:08 PM, Jeff Squyres wrote:

> WHAT: Make hwloc a 1st class item in OMPI
> 
> WHY: At least 2 pieces of new functionality want/need to use the hwloc data
> 
> WHERE: Put it in ompi/hwloc
> 
> WHEN: Some time in the 1.5 series
> 
> TIMEOUT: Tues teleconf, Oct 5 (about 2 weeks from now)
> 
> --------------------------------------------------------------------------------
> 
> A long time ago, I floated the proposal of putting hwloc at the top level in 
> opal so that parts of OPAL/ORTE/OMPI could use the data directly.  I didn't 
> have any concrete suggestions at the time about what exactly would use the 
> hwloc data -- just a feeling that "someone" would want to.
> 
> There are now two solid examples of functionality that want to use hwloc data 
> directly:
> 
> 1. Sandia + ORNL are working on a proposal for MPI_COMM_SOCKET, 
> MPI_COMM_NUMA_NODE, MPI_COMM_CORE, ...etc. (those names may not be the right 
> ones, but you get the idea).  That is, pre-defined communicators that contain 
> all the MPI procs on the same socket as you, the same NUMA node as you, the 
> same core as you, ...etc.
> 
> 2. INRIA presented a paper at Euro MPI last week that takes process distance 
> to NICs into account when coming up with the long-message splitting ratio for 
> the PML.  E.g., if we have 2 openib NICs with the same bandwidth, don't just 
> assume that we'll split long messages 50-50 across both of them.  Instead, 
> use NUMA distances to influence calculating the ratio.  See the paper here: 
> http://hal.archives-ouvertes.fr/inria-00486178/en/
> 
> A previous objection was that we are increasing our dependencies by making 
> hwloc be a 1st-class entity in OPAL -- we're hosed if hwloc ever goes out of 
> business.  Fair enough.  But that being said, hwloc is getting a bit of a 
> community growing around it: vendors are submitting patches for their 
> hardware, distros are picking it up, etc.  I certainly can't predict the 
> future, but hwloc looks in good shape for now.  There is a little risk in 
> depending on hwloc, but I think it's small enough to be ok.
> 
> Cisco does need to be able to compile OPAL/ORTE without hwloc, however (for 
> embedded environments where hwloc simply takes up space and adds no value).  
> I previously proposed wrapping a subset of the hwloc API with opal_*() 
> functions.  After thinking about that a bit, that seems like a lot of work 
> for little benefit -- how does one decide *which* subset of hwloc should be 
> wrapped?
> 
> Instead, it might be worthwhile to simply put hwloc up in ompi/hwloc (instead 
> of opal/hwloc).  Indeed, the 2 places that want to use hwloc are up in the 
> MPI layer -- I'm guessing that most functionality that wants hwloc will be up 
> in MPI.  And if we do the build system right, we can have paffinity/hwloc and 
> libmpi's hwloc all link against the same libhwloc_embedded so that:
> 
> a) there's no duplication in the process, and 
> b) paffinity/hwloc can still be compiled out with the usual mechanisms to 
> avoid having hwloc in OPAL/ORTE for embedded environments
> 
> (there's a little hand-waving there, but I think we can figure out the 
> details)
> 
> We *may* want to refactor paffinity and maffinity someday, but that's not 
> necessarily what I'm proposing here.
> 
> Comments?
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 

-- 
  Brian W. Barrett
  Dept. 1423: Scalable System Software
  Sandia National Laboratories



Reply via email to