Understood - but George is correct in that a failure to find the hostname in the db will create an infinite loop. Any thoughts on a reliable way to break it?
On Aug 19, 2013, at 2:52 PM, Nathan Hjelm <hje...@lanl.gov> wrote: > It would require a db read from every rank which is what we are trying > to avoid. This scales quadratic at best on Cray systems. > > -Nathan > > On Mon, Aug 19, 2013 at 02:48:18PM -0700, Ralph Castain wrote: >> Yeah, I have some concerns about it too...been trying to test it out some >> more. Would be good to see just how much that one change makes - maybe >> restoring just the hostname wouldn't have that big an impact. >> >> I'm leery of trying to ensure we strip all the opal_output loops if we don't >> find the hostname. >> >> On Aug 19, 2013, at 2:41 PM, George Bosilca <bosi...@icl.utk.edu> wrote: >> >>> As a result of this patch the first decode of a peer host name might happen >>> in the middle of a debug message (on the first call to >>> ompi_proc_get_hostname). Such a behavior might generate deadlocks based on >>> the level of output verbosity, and has significant potential to reintroduce >>> the recursive behavior the new state machine was supposed to remove. >>> >>> George. >>> >>> >>> On Aug 17, 2013, at 02:49 , svn-commit-mai...@open-mpi.org wrote: >>> >>>> Author: rhc (Ralph Castain) >>>> Date: 2013-08-16 20:49:18 EDT (Fri, 16 Aug 2013) >>>> New Revision: 29040 >>>> URL: https://svn.open-mpi.org/trac/ompi/changeset/29040 >>>> >>>> Log: >>>> When we direct launch an application, we rely on PMI for wireup support. >>>> In doing so, we lose the de facto data compression we get from the ORTE >>>> modex since we no longer get all the wireup info from every proc in a >>>> single blob. Instead, we have to iterate over all the procs, calling >>>> PMI_KVS_get for every value we require. >>>> >>>> This creates a really bad scaling behavior. Users have found a nearly 20% >>>> launch time differential between mpirun and PMI, with PMI being the slower >>>> method. Some of the problem is attributable to poor exchange algorithms in >>>> RM's like Slurm and Alps, but we make things worse by calling "get" so >>>> many times. >>>> >>>> Nathan (with a tad advice from me) has attempted to alleviate this problem >>>> by reducing the number of "get" calls. This required the following changes: >>>> >>>> * upon first request for data, have the OPAL db pmi component fetch and >>>> decode *all* the info from a given remote proc. It turned out we weren't >>>> caching the info, so we would continually request it and only decode the >>>> piece we needed for the immediate request. We now decode all the info and >>>> push it into the db hash component for local storage - and then all >>>> subsequent retrievals are fulfilled locally >>>> >>>> * reduced the amount of data by eliminating the exchange of the OMPI_ARCH >>>> value if heterogeneity is not enabled. This was used solely as a check so >>>> we would error out if the system wasn't actually homogeneous, which was >>>> fine when we thought there was no cost in doing the check. Unfortunately, >>>> at large scale and with direct launch, there is a non-zero cost of making >>>> this test. We are open to finding a compromise (perhaps turning the test >>>> off if requested?), if people feel strongly about performing the test >>>> >>>> * reduced the amount of RTE data being automatically fetched, and fetched >>>> the rest only upon request. In particular, we no longer immediately fetch >>>> the hostname (which is only used for error reporting), but instead get it >>>> when needed. Likewise for the RML uri as that info is only required for >>>> some (not all) environments. In addition, we no longer fetch the locality >>>> unless required, relying instead on the PMI clique info to tell us who is >>>> on our local node (if additional info is required, the fetch is performed >>>> when a modex_recv is issued). >>>> >>>> Again, all this only impacts direct launch - all the info is provided when >>>> launched via mpirun as there is no added cost to getting it >>>> >>>> Barring objections, we may move this (plus any required other pieces) to >>>> the 1.7 branch once it soaks for an appropriate time. >>> >>> _______________________________________________ >>> devel mailing list >>> de...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel