Re: [OMPI devel] RFC: move hwloc code base to opal/hwloc

Jeff Squyres Wed, 2 Jun 2010 10:03:34 -0400

To follow up on this RFC...

We discussed this RFC on the weekly call and no one seemed to hate it.  But 
there was a desire to:


a) be able to compile out hwloc for environments that don't want/need it (e.g., 
embedded environments)
b) have some degree of isolation in case hwloc ever dies
c) have some comonality of hwloc support (e.g., a central copy of the topology 
as an OPAL global variable, etc.)

The agreed-on compromise was to have a small set of OPAL wrappers that hide the 
real hwloc API.  I.e., the OPAL/ORTE/OMPI code bases would use the OPAL 
wrappers, not hwloc itself.  This allows OMPI to cleanly compile out hwloc 
(e.g., return OPAL_ERR_NOT_AVAILABLE when hwloc is compiled out) for platforms 
that do not want hwloc support and hwloc-unsupported platforms.

The ball is in my court to come up with a decent OPAL subset of the hwloc API 
that makes sense.  On the one hand, the hwloc API is huge because it has many, 
many accessors for all different kinds of access patterns.  But OTOH, we 
probably don't need all those accessors, even if having a smaller set of 
accessors may mean slightly less convenient/efficient access to the hwloc data. 
 

I'll try to strike a balance and come back to the community with a proposal.




On May 13, 2010, at 8:35 PM, Jeff Squyres wrote:

> WHAT: hwloc is currently embedded in opal/mca/paffinity/hwloc/hwloc -- move 
> it to be a first class citizen in opal/hwloc.
> 
> WHY: Let other portions of the OPAL, ORTE, and OMPI code bases use hwloc 
> services (remember that hwloc provides detailed topology information, not 
> just processor binding).
> 
> WHERE: Move opal/mca/paffinity/hwloc/hwloc to opal/hwloc, and adjust 
> associated configury
> 
> WHEN: For v1.5.1
> 
> TIMEOUT: Tuesday call, May 25
> 
> -----------------------------------------------------------------------------
> 
> MORE DETAILS:
> 
> The hwloc code base is *much* more powerful and useful than PLPA -- it 
> provides a wealth of information that PLPA did not.  Specifically: hwloc 
> provides data structures detailing the internal topology of a server.  You 
> can see cache line sizes, NUMA layouts, sockets, cores, hardware threads, 
> ...etc.
> 
> This information should be available to the entire OMPI code base -- not just 
> locked up in a paffinity component.  Putting hwloc up in opal/hwloc makes it 
> available everywhere.  Developers can just call hwloc_<foo>, and OMPI's build 
> system will automatically do all the right symbol-shifting if the embedded 
> hwloc is used in OMPI (and not symbol-shift if an external hwloc is used, 
> obviously).  It's magically delicious!
> 
> One immediate use that I'd like to see is to have the openib BTL use hwloc's 
> ibv functionality to find "nearby" HCAs (right now, you can only do this with 
> rankfiles).
> 
> I can foresee other components using cache line size information to help tune 
> performance (e.g., sm btl and sm coll...?).
> 
> To be clear: there will still be an hwloc paffinity component.  It just won't 
> embed its own copy of hwloc anymore.  It'll use the hwloc services provided 
> by the OMPI build system, just like the rest of the OPAL / ORTE / OMPI code 
> bases.
> 
> There will also be an option to compile hwloc out altogether -- some stubs 
> will be left that return ERR_NOT_SUPPORTED, or somesuch (details TBD).  The 
> reason for this is that there are some systems where processor affinity and 
> NUMA information aren't relevant (e.g., embedded systems).  Memory footprint 
> is key in such systems; hwloc would simply take up valuable RAM.
> 
> -- 
> Jeff Squyres
> [email protected]
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 


-- 
Jeff Squyres
[email protected]
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/

Re: [OMPI devel] RFC: move hwloc code base to opal/hwloc

Reply via email to