Excellent! Thanks Josh - both for the original work/commit and for the quick fix!
Ralph On 5/6/08 3:58 PM, "Josh Hursey" <jjhur...@open-mpi.org> wrote: > Sorry about that. Looking back at the filem logic it seems that I > returned success even if select failed (and just use the 'none' > passthrough component). I committed a patch in r18389 that fixes this > problem. > > This commit now has a warning that prints on the filem verbose stream > so if a user hits something like this in the wild unexpectedly then > we can help them debug it a bit. > > Cheers, > Josh > > > On May 6, 2008, at 2:56 PM, Ralph H Castain wrote: > >> Hmmm....well, I hit a problem (of course!). I have mca-no-build on >> the filem >> framework on my Mac. If I just mpriun -n 3 ./hello, I get the >> following >> error: >> >> ---------------------------------------------------------------------- >> ---- >> It looks like orte_init failed for some reason; your parallel >> process is >> likely to abort. There are many reasons that a parallel process can >> fail during orte_init; some of which are due to configuration or >> environment problems. This failure appears to be an internal failure; >> here's some additional information (which may only be relevant to an >> Open MPI developer): >> >> orte_filem_base_select failed >> --> Returned value Error (-1) instead of ORTE_SUCCESS >> >> ---------------------------------------------------------------------- >> ---- >> >> After looking at the source code for filem_select, I can run just >> fine if I >> specify -mca filem none on the cmd line. Otherwise, it looks like your >> select logic insists that at least one component must be built and >> selectable? >> >> Is that generally true, or is your filem framework the exception? I >> think >> this would not be a good general requirement - frankly, I don't >> think it is >> good for any framework to have such a requirement. >> >> Ralph >> >> >> >> On 5/6/08 12:09 PM, "Josh Hursey" <jjhur...@open-mpi.org> wrote: >> >>> This has been committed in r18381 >>> >>> Please let me know if you have any problems with this commit. >>> >>> Cheers, >>> Josh >>> >>> On May 5, 2008, at 10:41 AM, Josh Hursey wrote: >>> >>>> Awesome. >>>> >>>> The branch is updated to the latest trunk head. I encourage folks to >>>> check out this repository and make sure that it builds on their >>>> system. A normal build of the branch should be enough to find out if >>>> there are any cut-n-paste problems (though I tried to be careful, >>>> mistakes do happen). >>>> >>>> I haven't heard any problems so this is looking like it will come in >>>> tomorrow after the teleconf. I'll ask again there to see if there >>>> are >>>> any voices of concern. >>>> >>>> Cheers, >>>> Josh >>>> >>>> On May 5, 2008, at 9:58 AM, Jeff Squyres wrote: >>>> >>>>> This all sounds good to me! >>>>> >>>>> On Apr 29, 2008, at 6:35 PM, Josh Hursey wrote: >>>>> >>>>>> What: Add mca_base_select() and adjust frameworks & components to >>>>>> use >>>>>> it. >>>>>> Why: Consolidation of code for general goodness. >>>>>> Where: https://svn.open-mpi.org/svn/ompi/tmp-public/jjh-mca-play >>>>>> When: Code ready now. Documentation ready soon. >>>>>> Timeout: May 6, 2008 (After teleconf) [1 week] >>>>>> >>>>>> Discussion: >>>>>> ----------- >>>>>> For a number of years a few developers have been talking about >>>>>> creating a MCA base component selection function. For various >>>>>> reasons >>>>>> this was never implemented. Recently I decided to give it a try. >>>>>> >>>>>> A base select function will allow Open MPI to provide completely >>>>>> consistent selection behavior for many of its frameworks (18 of 31 >>>>>> to >>>>>> be exact at the moment). The primary goal of this work is to >>>>>> improving >>>>>> code maintainability through code reuse. Other benefits also >>>>>> result >>>>>> such as a slightly smaller memory footprint. >>>>>> >>>>>> The mca_base_select() function represented the most commonly used >>>>>> logic for component selection: Select the one component with the >>>>>> highest priority and close all of the not selected components. >>>>>> This >>>>>> function can be found at the path below in the branch: >>>>>> opal/mca/base/mca_base_components_select.c >>>>>> >>>>>> To support this I had to formalize a query() function in the >>>>>> mca_base_component_t of the form: >>>>>> int mca_base_query_component_fn(mca_base_module_t **module, int >>>>>> *priority); >>>>>> >>>>>> This function is specified after the open and close component >>>>>> functions in this structure as to allow compatibility with >>>>>> frameworks >>>>>> that do not use the base selection logic. Frameworks that do *not* >>>>>> use >>>>>> this function are *not* effected by this commit. However, every >>>>>> component in the frameworks that use the mca_base_select function >>>>>> must >>>>>> adjust their component query function to fit that specified above. >>>>>> >>>>>> 18 frameworks in Open MPI have been changed. I have updated all of >>>>>> the >>>>>> components in the 18 frameworks available in the trunk on my >>>>>> branch. >>>>>> The effected frameworks are: >>>>>> - OPAL Carto >>>>>> - OPAL crs >>>>>> - OPAL maffinity >>>>>> - OPAL memchecker >>>>>> - OPAL paffinity >>>>>> - ORTE errmgr >>>>>> - ORTE ess >>>>>> - ORTE Filem >>>>>> - ORTE grpcomm >>>>>> - ORTE odls >>>>>> - ORTE pml >>>>>> - ORTE ras >>>>>> - ORTE rmaps >>>>>> - ORTE routed >>>>>> - ORTE snapc >>>>>> - OMPI crcp >>>>>> - OMPI dpm >>>>>> - OMPI pubsub >>>>>> >>>>>> There was a question of the memory footprint change as a result of >>>>>> this commit. I used 'pmap' to determine process memory footprint >>>>>> of a >>>>>> hello world MPI program. Static and Shared build numbers are below >>>>>> along with variations on launching locally and to a single node >>>>>> allocated by SLURM. All of this was on Indiana University's Odin >>>>>> machine. We compare against the trunk (r18276) representing the >>>>>> last >>>>>> SVN sync point of the branch. >>>>>> >>>>>> Process(shared)| Trunk | Branch | Diff (Improvement) >>>>>> ---------------+----------+---------+------- >>>>>> mpirun (orted) | 39976K | 36828K | 3148K >>>>>> hello (0) | 229288K | 229268K | 20K >>>>>> hello (1) | 229288K | 229268K | 20K >>>>>> ---------------+----------+---------+------- >>>>>> mpirun | 40032K | 37924K | 2108K >>>>>> orted | 34720K | 34660K | 60K >>>>>> hello (0) | 228404K | 228384K | 20K >>>>>> hello (1) | 228404K | 228384K | 20K >>>>>> >>>>>> Process(static)| Trunk | Branch | Diff (Improvement) >>>>>> ---------------+----------+---------+------- >>>>>> mpirun (orted) | 21384K | 21372K | 12K >>>>>> hello (0) | 194000K | 193980K | 20K >>>>>> hello (1) | 194000K | 193980K | 20K >>>>>> ---------------+----------+---------+------- >>>>>> mpirun | 21384K | 21372K | 12K >>>>>> orted | 21208K | 21196K | 12K >>>>>> hello (0) | 193116K | 193096K | 20K >>>>>> hello (1) | 193116K | 193096K | 20K >>>>>> >>>>>> As you can see there are some small memory footprint >>>>>> improvements on >>>>>> my branch that result from this work. The size of the Open MPI >>>>>> project >>>>>> shrinks a bit as well. This commit cuts between 3,500 and 2,000 >>>>>> lines >>>>>> of code (depending on how you count) so about a ~1% code shrink. >>>>>> >>>>>> The branch is stable in all of the testing I have done, but there >>>>>> are >>>>>> some platforms on which I cannot test. So please give this >>>>>> branch a >>>>>> try and let me know if you find any problems. >>>>>> >>>>>> Cheers, >>>>>> Josh >>>>>> >>>>>> _______________________________________________ >>>>>> devel mailing list >>>>>> de...@open-mpi.org >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>> >>>>> >>>>> -- >>>>> Jeff Squyres >>>>> Cisco Systems >>>>> >>>>> _______________________________________________ >>>>> devel mailing list >>>>> de...@open-mpi.org >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>> >>>> _______________________________________________ >>>> devel mailing list >>>> de...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> >>> _______________________________________________ >>> devel mailing list >>> de...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel