I concur - galen On Jun 23, 2008, at 3:44 PM, Brian W. Barrett wrote:
That sounds like a reasonable plan to me. Brian On Mon, 23 Jun 2008, Ralph H Castain wrote:Okay, so let's explore an alternative that preserves the support you are seeking for the "ignorant user", but doesn't penalize everyone else. What wecould do is simply set things up so that: 1. if -mca plm xyz is provided, then no modex data is added2. if it is not provided, then only rank=0 inserts the data. All other procssimply check their own selection against the one given by rank=0Now, if a knowledgeable user or sys admin specifies what to use for their system, we won't penalize their startup time. A user who doesn't know whatto do gets to run, albeit less scalably on startup.Looking forward from there, we can look to a day where failing to initialize something that exists on the system could be detected in some other fashion, letting the local proc abort since it would know that other procs that detected similar capabilities may well have selected that PML. For now,though, this would solve the problem. Make sense? Ralph On 6/23/08 1:31 PM, "Brian W. Barrett" <brbar...@open-mpi.org> wrote:The problem is that we default to OB1, but that's not the right choice for some platforms (like Pathscale / PSM), where there's a huge performance hit for using OB1. So we run into a situation where user installs Open MPI, starts running, gets horrible performance, bad mouths Open MPI, and now we're in that game again. Yeah, the sys admin should know what to do,but it doesn't always work that way. Brian On Mon, 23 Jun 2008, Ralph H Castain wrote:My fault - I should be more precise in my language. ;-/#1 is not adequate, IMHO, as it forces us to -always- do a modex. It seems to me that a simpler solution to what you describe is for the user to specify -mca pml ob1, or -mca pml cm. If the latter, then you could deal with the failed-to-initialize problem cleanly by having the proc directlyabort.Again, sometimes I think we attempt to automate too many things. This seems like a pretty clear case where you know what you want - the sys admin, if nobody else, can certainly set that mca param in the default param file!Otherwise, it seems to me that you are relying on the modex to detect that your proc failed to init the correct subsystem. I hate to force a modex just for that - if so, then perhaps this could again be a settable option to avoid requiring non-scalable behavior for those of us who want scalability?On 6/23/08 1:21 PM, "Brian W. Barrett" <brbar...@open-mpi.org> wrote:The selection code was added because frequently high speed interconnects fail to initialize properly due to random stuff happening (yes, that's a horrible statement, but true). We ran into a situation with some really flaky machines where most of the processes would chose CM, but a couple would fail to initialize the MTL and therefore chose OB1. This lead to ahang situation, which is the worst of the worst.I think #1 is adequate, although it doesn't handle spawn particularly well. And spawn is generally used in environments where such networkmismatches are most likely to occur. Brian On Mon, 23 Jun 2008, Ralph H Castain wrote:Since my goal is to eliminate the modex completely for managedinstallations, could you give me a brief understanding of this eventual PML selection logic? It would help to hear an example of how and why different procs could get different answers - and why we would want to allow them todo so. Thanks RalphOn 6/23/08 11:59 AM, "Aurélien Bouteiller" <boute...@eecs.utk.edu> wrote:The first approach sounds fair enough to me. We should avoid 2 and 3as the pml selection mechanism used to bemore complex before we reduced it to accommodate a major design bug in the BTL selection process. When using the complete PML selection, BTL would be initialized several times, leading to a variety of bugs. Eventually the PML selection should return to its old self, when theBTL bug gets fixed. Aurelien Le 23 juin 08 à 12:36, Ralph H Castain a écrit :Yo all I've been doing further research into the modex and came across something Idon't fully understand. It seems we have each process insert intothe modex the name of the PML module that it selected. Once the modex has exchangedthat info, it then loops across all procs in the job to check their selection, and aborts if any proc picked a different PML module.All well and good...assuming that procs actually -can- choose different PMLmodules and hence create an "abort" scenario. However, if I lookinside thePML's at their selection logic, I find that a proc can ONLY pick amodule other than ob1 if:1. the user specifies the module to use via -mca pml xyz or by using amodule specific mca param to adjust its priority. In this case, since themca param is propagated, ALL procs have no choice but to pick thatsame module, so that can't cause us to abort (we will have already returned an error and aborted if the specified module can't run).2. the pml/cm module detects that an MTL module was selected, andthat it is other than "psm". In this case, the CM module will be selected because its default priority is higher than that of OB1.In looking deeper into the MTL selection logic, it appears to methat youeither have the required capability or you don't. I can see that insomeenvironments (e.g., rsh across unmanaged collections of machines),it mightbe possible for someone to launch across a set of machines wheresome do andsome don't have the required support. However, in all other cases,this will be homogeneous across the system.Given this analysis (and someone more familiar with the PML shouldfeel free to confirm or correct it), it seems to me that this could be streamlined via one or more means:1. at the most, we could have rank=0 add the PML module name to themodex, and other procs simply check it against their own and return an error ifthey differ. This accomplishes the identical functionality to whatwe have today, but with much less info in the modex. 2. we could eliminate this info from the modex altogether by requiring theuser to specify the PML module if they want something other than thedefaultOB1. In this case, there can be no confusion over what each proc isto use.The CM module will attempt to init the MTL - if it cannot do so,then the job will return the correct error and tell the user that CM/MTL support is unavailable.3. we could again eliminate the info by not inserting it into themodex if(a) the default PML module is selected, or (b) the user specifiedthe PMLmodule to be used. In the first case, each proc can simply check tosee ifthey picked the default - if not, then we can insert the info toindicate the difference. Thus, in the "standard" case, no info will be inserted.In the second case, we will already get an error if the specifiedPML modulecould not be used. Hence, the modex check provides no additionalinfo or value.I understand the motivation to support automation. However, in thiscase,the automation actually doesn't seem to buy us very much, and it isn't coming "free". So perhaps some change in how this is done would bein order? Ralph _______________________________________________ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel_______________________________________________ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel_______________________________________________ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel_______________________________________________ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel_______________________________________________ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel_______________________________________________ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel_______________________________________________ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel_______________________________________________ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel