Re: [OMPI devel] PML selection logic

Jeff Squyres Tue, 24 Jun 2008 08:29:01 -0400

Also sounds good to me.

Note that the most difficult part of the forward-looking plan is thatwe usually can't tell the difference between "something failed toinitialize" and "you don't have support for feature X".

I like the general philosophy of: running out of the box always worksjust fine, but if you/the sysadmin is smart, you can get performanceimprovements.



On Jun 23, 2008, at 4:18 PM, Shipman, Galen M. wrote:

I concur
- galen

On Jun 23, 2008, at 3:44 PM, Brian W. Barrett wrote:
That sounds like a reasonable plan to me.

Brian

On Mon, 23 Jun 2008, Ralph H Castain wrote:
Okay, so let's explore an alternative that preserves the supportyou areseeking for the "ignorant user", but doesn't penalize everyoneelse. What we
could do is simply set things up so that:

1. if -mca plm xyz is provided, then no modex data is added
2. if it is not provided, then only rank=0 inserts the data. Allother procs
simply check their own selection against the one given by rank=0
Now, if a knowledgeable user or sys admin specifies what to usefor theirsystem, we won't penalize their startup time. A user who doesn'tknow what
to do gets to run, albeit less scalably on startup.
Looking forward from there, we can look to a day where failing toinitializesomething that exists on the system could be detected in someother fashion,letting the local proc abort since it would know that other procsthatdetected similar capabilities may well have selected that PML. Fornow,
though, this would solve the problem.

Make sense?
Ralph
On 6/23/08 1:31 PM, "Brian W. Barrett" <[email protected]>wrote:
The problem is that we default to OB1, but that's not the rightchoice forsome platforms (like Pathscale / PSM), where there's a hugeperformancehit for using OB1. So we run into a situation where userinstalls OpenMPI, starts running, gets horrible performance, bad mouths OpenMPI, andnow we're in that game again. Yeah, the sys admin should knowwhat to do,
but it doesn't always work that way.

Brian


On Mon, 23 Jun 2008, Ralph H Castain wrote:
My fault - I should be more precise in my language. ;-/
#1 is not adequate, IMHO, as it forces us to -always- do amodex. It seemsto me that a simpler solution to what you describe is for theuser tospecify -mca pml ob1, or -mca pml cm. If the latter, then youcould dealwith the failed-to-initialize problem cleanly by having the procdirectly
abort.
Again, sometimes I think we attempt to automate too many things.This seemslike a pretty clear case where you know what you want - the sysadmin, ifnobody else, can certainly set that mca param in the defaultparam file!
Otherwise, it seems to me that you are relying on the modex todetect thatyour proc failed to init the correct subsystem. I hate to forcea modex justfor that - if so, then perhaps this could again be a settableoption toavoid requiring non-scalable behavior for those of us who wantscalability?
On 6/23/08 1:21 PM, "Brian W. Barrett" <[email protected]>wrote:
The selection code was added because frequently high speedinterconnectsfail to initialize properly due to random stuff happening (yes,that's ahorrible statement, but true). We ran into a situation withsome reallyflaky machines where most of the processes would chose CM, buta couplewould fail to initialize the MTL and therefore chose OB1. Thislead to a
hang situation, which is the worst of the worst.
I think #1 is adequate, although it doesn't handle spawnparticularlywell. And spawn is generally used in environments where suchnetwork
mismatches are most likely to occur.

Brian


On Mon, 23 Jun 2008, Ralph H Castain wrote:
Since my goal is to eliminate the modex completely for managed
installations, could you give me a brief understanding of thiseventual PMLselection logic? It would help to hear an example of how andwhy differentprocs could get different answers - and why we would want toallow them to
do so.

Thanks
Ralph
On 6/23/08 11:59 AM, "Aurélien Bouteiller" <[email protected]> wrote:
The first approach sounds fair enough to me. We should avoid2 and 3
as the pml selection mechanism used to be
more complex before we reduced it to accommodate a majordesign bug inthe BTL selection process. When using the complete PMLselection, BTLwould be initialized several times, leading to a variety ofbugs.Eventually the PML selection should return to its old self,when the
BTL bug gets fixed.

Aurelien

Le 23 juin 08 à 12:36, Ralph H Castain a écrit :
Yo all
I've been doing further research into the modex and cameacross
something I
don't fully understand. It seems we have each process insertinto
the modex
the name of the PML module that it selected. Once the modexhas
exchanged
that info, it then loops across all procs in the job tocheck theirselection, and aborts if any proc picked a different PMLmodule.
All well and good...assuming that procs actually -can- choose
different PML
modules and hence create an "abort" scenario. However, if Ilook
inside the
PML's at their selection logic, I find that a proc can ONLYpick a
module
other than ob1 if:
1. the user specifies the module to use via -mca pml xyz orby using amodule specific mca param to adjust its priority. In thiscase,
since the
mca param is propagated, ALL procs have no choice but topick that
same
module, so that can't cause us to abort (we will have already
returned an
error and aborted if the specified module can't run).
2. the pml/cm module detects that an MTL module wasselected, and
that it is
other than "psm". In this case, the CM module will be selected
because its
default priority is higher than that of OB1.
In looking deeper into the MTL selection logic, it appearsto me
that you
either have the required capability or you don't. I can seethat in
some
environments (e.g., rsh across unmanaged collections ofmachines),
it might
be possible for someone to launch across a set of machineswhere
some do and
some don't have the required support. However, in all othercases,
this will
be homogeneous across the system.
Given this analysis (and someone more familiar with the PMLshould
feel free
to confirm or correct it), it seems to me that this could be
streamlined via
one or more means:
1. at the most, we could have rank=0 add the PML module nameto the
modex,
and other procs simply check it against their own and returnan
error if
they differ. This accomplishes the identical functionalityto what
we have
today, but with much less info in the modex.

2. we could eliminate this info from the modex altogether by
requiring the
user to specify the PML module if they want something otherthan the
default
OB1. In this case, there can be no confusion over what eachproc is
to use.
The CM module will attempt to init the MTL - if it cannot doso,
then the
job will return the correct error and tell the user that CM/MTL
support is
unavailable.
3. we could again eliminate the info by not inserting itinto the
modex if
(a) the default PML module is selected, or (b) the userspecified
the PML
module to be used. In the first case, each proc can simplycheck to
see if
they picked the default - if not, then we can insert theinfo to
indicate
the difference. Thus, in the "standard" case, no info will be
inserted.
In the second case, we will already get an error if thespecified
PML module
could not be used. Hence, the modex check provides noadditional
info or
value.
I understand the motivation to support automation. However,in this
case,
the automation actually doesn't seem to buy us very much,and it isn'tcoming "free". So perhaps some change in how this is donewould be
in order?

Ralph



_______________________________________________
devel mailing list
[email protected]
http://www.open-mpi.org/mailman/listinfo.cgi/devel
_______________________________________________
devel mailing list
[email protected]
http://www.open-mpi.org/mailman/listinfo.cgi/devel
_______________________________________________
devel mailing list
[email protected]
http://www.open-mpi.org/mailman/listinfo.cgi/devel
_______________________________________________
devel mailing list
[email protected]
http://www.open-mpi.org/mailman/listinfo.cgi/devel
_______________________________________________
devel mailing list
[email protected]
http://www.open-mpi.org/mailman/listinfo.cgi/devel
_______________________________________________
devel mailing list
[email protected]
http://www.open-mpi.org/mailman/listinfo.cgi/devel
_______________________________________________
devel mailing list
[email protected]
http://www.open-mpi.org/mailman/listinfo.cgi/devel
_______________________________________________
devel mailing list
[email protected]
http://www.open-mpi.org/mailman/listinfo.cgi/devel
_______________________________________________
devel mailing list
[email protected]
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems

Re: [OMPI devel] PML selection logic

Reply via email to