FYI.

I think I have fixes ready, but I am bummed that we didn't fix the whole 
paffinity mess properly in 1.6.  :-(

Begin forwarded message:

> From: Open MPI <b...@open-mpi.org>
> Subject: [Open MPI] #3108: Affinity still busted in v1.6
> Date: May 24, 2012 2:59:42 PM EDT
> Cc: <b...@osl.iu.edu>
> 
> #3108: Affinity still busted in v1.6
> ---------------------+----------------------------
> Reporter:  jsquyres  |      Owner:  rhc
>    Type:  defect    |     Status:  new
> Priority:  major     |  Milestone:  Open MPI 1.6.1
> Version:  trunk     |   Keywords:
> ---------------------+----------------------------
> I found a system yesterday where affinity is still horribly broken in
> v1.6.  bind-to-core and bind-to-socket effectively did completely
> incorrect things.  Among other things, the system in question has
> effectively fairly random physical socket/core numbering.  It's not
> uniform across all the cores in any given socket.
> 
> I have a new bitbucket where I think I've fixed the problems, and will be
> reviewing the code with Ralph soon:
> 
>     https://bitbucket.org/jsquyres/ompi-affinity-again-v1.6
> 
> There were actually three bugs (that I've found so far); there's a
> separate commit on that bitbucket for each.  See the commit messages on
> each of them.
> 
> Once this firms up a bit, I'll make a tarball and ask others in the
> community to test it (e.g., Oracle and IBM, which have traditionally been
> good at finding whacky paffinity bugs).
> 
> Note that this ''only'' affects OMPI v1.6 -- the trunk has a wholly
> revamped affinity system and the entire paffintiy framework is gone
> (yay!).
> 
> -- 
> Ticket URL: <https://svn.open-mpi.org/trac/ompi/ticket/3108>
> Open MPI <http://www.open-mpi.org/>
> 


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/


Reply via email to