FYI. I think I have fixes ready, but I am bummed that we didn't fix the whole paffinity mess properly in 1.6. :-(
Begin forwarded message: > From: Open MPI <b...@open-mpi.org> > Subject: [Open MPI] #3108: Affinity still busted in v1.6 > Date: May 24, 2012 2:59:42 PM EDT > Cc: <b...@osl.iu.edu> > > #3108: Affinity still busted in v1.6 > ---------------------+---------------------------- > Reporter: jsquyres | Owner: rhc > Type: defect | Status: new > Priority: major | Milestone: Open MPI 1.6.1 > Version: trunk | Keywords: > ---------------------+---------------------------- > I found a system yesterday where affinity is still horribly broken in > v1.6. bind-to-core and bind-to-socket effectively did completely > incorrect things. Among other things, the system in question has > effectively fairly random physical socket/core numbering. It's not > uniform across all the cores in any given socket. > > I have a new bitbucket where I think I've fixed the problems, and will be > reviewing the code with Ralph soon: > > https://bitbucket.org/jsquyres/ompi-affinity-again-v1.6 > > There were actually three bugs (that I've found so far); there's a > separate commit on that bitbucket for each. See the commit messages on > each of them. > > Once this firms up a bit, I'll make a tarball and ask others in the > community to test it (e.g., Oracle and IBM, which have traditionally been > good at finding whacky paffinity bugs). > > Note that this ''only'' affects OMPI v1.6 -- the trunk has a wholly > revamped affinity system and the entire paffintiy framework is gone > (yay!). > > -- > Ticket URL: <https://svn.open-mpi.org/trac/ompi/ticket/3108> > Open MPI <http://www.open-mpi.org/> > -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/