No problem Lenny, I am looking at this now.
--td
Lenny Verkhovsky wrote:
I really would like to help, but I am not sure how much time will I
have in the very near future ( we are expecting a babygirl delivery ).
On 8/6/08, *Open MPI* <b...@open-mpi.org <mailto:b...@open-mpi.org>>
wrote:
#1435: Crash on PPC (with SMT off) when using mpi_paffinity alone
-------------------+--------------------------------------------------------
Reporter: jnysal | Owner: rhc
Type: defect | Status: new
Priority: major | Milestone: Open MPI 1.3
Version: | Resolution:
Keywords: |
-------------------+--------------------------------------------------------
Changes (by rhc):
* owner: jnysal => rhc
* status: assigned => new
Comment:
Several of us have had a telecon on this subject, and have a
proposed
solution:
The real root of the problem here is that we never clearly
delineated
between physical and logical processors in OMPI. Instead, there
was an
implicit assumption that the two were one-and-the-same. Thus, if
a user
specified a slot_list, we just directly dumped that into the
paffinity
subsystem.
Unfortunately, when we use paffinity_alone and automatically map
the ranks
to processors, we again just passed the info the paffinity
subsystem -
without clearly indicating whether this was a physical processor or
logical processor.
Our feeling is that we need to cleanly handle both physical and
logical
processor specifications. Accordingly, we propose to do the
following:
1. modify the opal_paffinity_base_get API to add a boolean flag
indicating
we want logical (true) or physical (false) processor id's in the
returned
cpumask
2. modify the opal_paffinity_base_set API to add a boolean flag
indicating
we provided logical (true) or physical (false) processor id's in the
cpumask
3. modify the opal_paffinity linux and solaris components to do the
necessary mapping to handle the two cases so that we bind or
return data
according to the new flag
4. modify ompi_mpi_init so that mpi_paffinity_alone indicates the
automatic binding is to be done on the basis of logical
processor id's
5. modify the syntax of the slot_list mca param so that it
defaults to
logical processor ids, but allows the user to prepend the
specification
with a "P" or "p" to indicate these are physical processor id's.
This will
also be applied to the parsing of the rank_file mapping file.
6. modify the places that utilize that param to handle the new
syntax,
including the opal_paffinity_base_slot_list_set and its companion
functions
7. update the documentation to reflect the changed syntax
Terry has volunteered to modify the paffinity components. Ralph
will do
the ORTE-level stuff and mpi_init, and likely the slot_list
stuff too
(unless Lenny has time and is willing to help there?). This will
be done
on a new Hg branch that Ralph will create - will post the access
info here
later today.
Any comments? Please post soon so we don't go too far down path
before we
hear them!
--
Ticket URL:
<https://svn.open-mpi.org/trac/ompi/ticket/1435#comment:18>
Open MPI <http://www.open-mpi.org/>
_______________________________________________
bugs mailing list
b...@open-mpi.org <mailto:b...@open-mpi.org>
http://www.open-mpi.org/mailman/listinfo.cgi/bugs
------------------------------------------------------------------------
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel