Not quite, Josh - I fixed it in our branch. Will send you a revised patch
that does the job off-list for your review.

Thanks
Ralph



On 5/9/08 9:35 AM, "Josh Hursey" <jjhur...@open-mpi.org> wrote:

> Ok I think I understand the problem a bit better now. I attached a
> patch that should fix this, but I want you to check it out before I
> commit just to make sure.
> 
> If you specify '-mca filter xml' on the command line then only the
> 'xml' component should be opened by mca_base_open. The problem was
> that the selection logic used -1 as the lowest acceptable priority,
> which conflicted with the set priority of the 'xml' component. This
> patch sets this value to INT32_MIN which should be well below any
> negative priority that a component would set for itself.
> 
> Let me know if this works for you and I'll commit it.
> 
> Cheers,
> Josh
> 
> 
> 
> On May 9, 2008, at 11:14 AM, Ralph Castain wrote:
> 
>> Sure - take a look at the hg repository Jeff and I are working on:
>> 
>> http://www.open-mpi.org/hg/hgwebdir.cgi/rhc/channel
>> 
>> Te opal/mca/filter framework illustrates the problem. I have one
>> component
>> in there right now, with a default module defined in the base. That
>> component must only be selected if the user calls it. With the current
>> select logic, I can't do this - if the priority is >=0, then it
>> always is
>> automatically selected. Priority < 0, never selectable even if
>> specified.
>> 
>> Thanks
>> Ralph
>> 
>> 
>> 
>> On 5/9/08 8:52 AM, "Josh Hursey" <jjhur...@open-mpi.org> wrote:
>> 
>>> Ralph,
>>> 
>>> Can you give me an example of a component that I can look at? It will
>>> allow me to test the fix before committing, and to better understand
>>> the problem.
>>> 
>>> -- Josh
>>> 
>>> On May 9, 2008, at 10:41 AM, Ralph Castain wrote:
>>> 
>>>> I just hit a problem with this logic - should be a minor change.
>>>> 
>>>> We have several frameworks where we have components that are only
>>>> allowed be
>>>> selected if the user specifically requests them by stating -mca foo
>>>> bar.
>>>> Because it is possible for there to be no other components that want
>>>> to be
>>>> selected, and because it is permissible for no components to be
>>>> selected for
>>>> that framework, we set bar's priority to be -1.
>>>> 
>>>> The new select logic will not allow a negative priority to be
>>>> selected, even
>>>> if the user specifically requested that component.
>>>> 
>>>> If we set the priority to be 0, then the system will allow the
>>>> component to
>>>> be automatically selected. This is not allowed as it can lead to bad
>>>> behavior.
>>>> 
>>>> So what we need the select system to do is say "if someone
>>>> specified a
>>>> specific component, don't worry about the returned priority - just
>>>> use it"
>>>> 
>>>> Josh: could you please modify this?
>>>> 
>>>> Thanks!
>>>> Ralph
>>>> 
>>>> 
>>>> 
>>>> On 5/8/08 7:04 PM, "Pak Lui" <pak....@sun.com> wrote:
>>>> 
>>>>> Thanks very much Josh! Will try it out soon.
>>>>> 
>>>>> Josh Hursey wrote:
>>>>>> Sorry about that. I didn't test that type of option. It should be
>>>>>> working in r18418. Let me know if you see any more issues.
>>>>>> 
>>>>>> -- Josh
>>>>>> 
>>>>>> On May 8, 2008, at 6:04 PM, Pak Lui wrote:
>>>>>> 
>>>>>>> I think I have a problem but I am not sure. I used to be able to
>>>>>>> use the
>>>>>>> circumflex (^) to switch between the gridengine launcher and the
>>>>>>> ssh
>>>>>>> launchers by doing something like this, e.g. -mca plm
>>>>>>> ^gridengine, to
>>>>>>> exclude some of the components plm (and also in ras). It doesn't
>>>>>>> seem
>>>>>>> like the 'negate' is in mca_base_component anymore. I guess I
>>>>>>> just have
>>>>>>> to spell out which component I want explicitly...
>>>>>>> 
>>>>>>> Josh Hursey wrote:
>>>>>>>> This has been committed in r18381
>>>>>>>> 
>>>>>>>> Please let me know if you have any problems with this commit.
>>>>>>>> 
>>>>>>>> Cheers,
>>>>>>>> Josh
>>>>>>>> 
>>>>>>>> On May 5, 2008, at 10:41 AM, Josh Hursey wrote:
>>>>>>>> 
>>>>>>>>> Awesome.
>>>>>>>>> 
>>>>>>>>> The branch is updated to the latest trunk head. I encourage
>>>>>>>>> folks to
>>>>>>>>> check out this repository and make sure that it builds on their
>>>>>>>>> system. A normal build of the branch should be enough to find
>>>>>>>>> out if
>>>>>>>>> there are any cut-n-paste problems (though I tried to be
>>>>>>>>> careful,
>>>>>>>>> mistakes do happen).
>>>>>>>>> 
>>>>>>>>> I haven't heard any problems so this is looking like it will
>>>>>>>>> come in
>>>>>>>>> tomorrow after the teleconf. I'll ask again there to see if
>>>>>>>>> there are
>>>>>>>>> any voices of concern.
>>>>>>>>> 
>>>>>>>>> Cheers,
>>>>>>>>> Josh
>>>>>>>>> 
>>>>>>>>> On May 5, 2008, at 9:58 AM, Jeff Squyres wrote:
>>>>>>>>> 
>>>>>>>>>> This all sounds good to me!
>>>>>>>>>> 
>>>>>>>>>> On Apr 29, 2008, at 6:35 PM, Josh Hursey wrote:
>>>>>>>>>> 
>>>>>>>>>>> What:  Add mca_base_select() and adjust frameworks &
>>>>>>>>>>> components to
>>>>>>>>>>> use
>>>>>>>>>>> it.
>>>>>>>>>>> Why:   Consolidation of code for general goodness.
>>>>>>>>>>> Where: https://svn.open-mpi.org/svn/ompi/tmp-public/jjh-mca-
>>>>>>>>>>> play
>>>>>>>>>>> When:  Code ready now. Documentation ready soon.
>>>>>>>>>>> Timeout: May 6, 2008 (After teleconf) [1 week]
>>>>>>>>>>> 
>>>>>>>>>>> Discussion:
>>>>>>>>>>> -----------
>>>>>>>>>>> For a number of years a few developers have been talking
>>>>>>>>>>> about
>>>>>>>>>>> creating a MCA base component selection function. For various
>>>>>>>>>>> reasons
>>>>>>>>>>> this was never implemented. Recently I decided to give it a
>>>>>>>>>>> try.
>>>>>>>>>>> 
>>>>>>>>>>> A base select function will allow Open MPI to provide
>>>>>>>>>>> completely
>>>>>>>>>>> consistent selection behavior for many of its frameworks (18
>>>>>>>>>>> of 31
>>>>>>>>>>> to
>>>>>>>>>>> be exact at the moment). The primary goal of this work is to
>>>>>>>>>>> improving
>>>>>>>>>>> code maintainability through code reuse. Other benefits also
>>>>>>>>>>> result
>>>>>>>>>>> such as a slightly smaller memory footprint.
>>>>>>>>>>> 
>>>>>>>>>>> The mca_base_select() function represented the most commonly
>>>>>>>>>>> used
>>>>>>>>>>> logic for component selection: Select the one component with
>>>>>>>>>>> the
>>>>>>>>>>> highest priority and close all of the not selected
>>>>>>>>>>> components. This
>>>>>>>>>>> function can be found at the path below in the branch:
>>>>>>>>>>> opal/mca/base/mca_base_components_select.c
>>>>>>>>>>> 
>>>>>>>>>>> To support this I had to formalize a query() function in the
>>>>>>>>>>> mca_base_component_t of the form:
>>>>>>>>>>> int mca_base_query_component_fn(mca_base_module_t **module,
>>>>>>>>>>> int
>>>>>>>>>>> *priority);
>>>>>>>>>>> 
>>>>>>>>>>> This function is specified after the open and close component
>>>>>>>>>>> functions in this structure as to allow compatibility with
>>>>>>>>>>> frameworks
>>>>>>>>>>> that do not use the base selection logic. Frameworks that do
>>>>>>>>>>> *not*
>>>>>>>>>>> use
>>>>>>>>>>> this function are *not* effected by this commit. However,
>>>>>>>>>>> every
>>>>>>>>>>> component in the frameworks that use the mca_base_select
>>>>>>>>>>> function
>>>>>>>>>>> must
>>>>>>>>>>> adjust their component query function to fit that specified
>>>>>>>>>>> above.
>>>>>>>>>>> 
>>>>>>>>>>> 18 frameworks in Open MPI have been changed. I have updated
>>>>>>>>>>> all of
>>>>>>>>>>> the
>>>>>>>>>>> components in the 18 frameworks available in the trunk on my
>>>>>>>>>>> branch.
>>>>>>>>>>> The effected frameworks are:
>>>>>>>>>>> - OPAL Carto
>>>>>>>>>>> - OPAL crs
>>>>>>>>>>> - OPAL maffinity
>>>>>>>>>>> - OPAL memchecker
>>>>>>>>>>> - OPAL paffinity
>>>>>>>>>>> - ORTE errmgr
>>>>>>>>>>> - ORTE ess
>>>>>>>>>>> - ORTE Filem
>>>>>>>>>>> - ORTE grpcomm
>>>>>>>>>>> - ORTE odls
>>>>>>>>>>> - ORTE pml
>>>>>>>>>>> - ORTE ras
>>>>>>>>>>> - ORTE rmaps
>>>>>>>>>>> - ORTE routed
>>>>>>>>>>> - ORTE snapc
>>>>>>>>>>> - OMPI crcp
>>>>>>>>>>> - OMPI dpm
>>>>>>>>>>> - OMPI pubsub
>>>>>>>>>>> 
>>>>>>>>>>> There was a question of the memory footprint change as a
>>>>>>>>>>> result of
>>>>>>>>>>> this commit. I used 'pmap' to determine process memory
>>>>>>>>>>> footprint
>>>>>>>>>>> of a
>>>>>>>>>>> hello world MPI program. Static and Shared build numbers are
>>>>>>>>>>> below
>>>>>>>>>>> along with variations on launching locally and to a single
>>>>>>>>>>> node
>>>>>>>>>>> allocated by SLURM. All of this was on Indiana University's
>>>>>>>>>>> Odin
>>>>>>>>>>> machine. We compare against the trunk (r18276) representing
>>>>>>>>>>> the last
>>>>>>>>>>> SVN sync point of the branch.
>>>>>>>>>>> 
>>>>>>>>>>> Process(shared)| Trunk    | Branch  | Diff (Improvement)
>>>>>>>>>>> ---------------+----------+---------+-------
>>>>>>>>>>> mpirun (orted) |   39976K |  36828K | 3148K
>>>>>>>>>>> hello (0)      |  229288K | 229268K |   20K
>>>>>>>>>>> hello (1)      |  229288K | 229268K |   20K
>>>>>>>>>>> ---------------+----------+---------+-------
>>>>>>>>>>> mpirun         |   40032K |  37924K | 2108K
>>>>>>>>>>> orted          |   34720K |  34660K |   60K
>>>>>>>>>>> hello (0)      |  228404K | 228384K |   20K
>>>>>>>>>>> hello (1)      |  228404K | 228384K |   20K
>>>>>>>>>>> 
>>>>>>>>>>> Process(static)| Trunk    | Branch  | Diff (Improvement)
>>>>>>>>>>> ---------------+----------+---------+-------
>>>>>>>>>>> mpirun (orted) |   21384K |  21372K |  12K
>>>>>>>>>>> hello (0)      |  194000K | 193980K |  20K
>>>>>>>>>>> hello (1)      |  194000K | 193980K |  20K
>>>>>>>>>>> ---------------+----------+---------+-------
>>>>>>>>>>> mpirun         |   21384K |  21372K |  12K
>>>>>>>>>>> orted          |   21208K |  21196K |  12K
>>>>>>>>>>> hello (0)      |  193116K | 193096K |  20K
>>>>>>>>>>> hello (1)      |  193116K | 193096K |  20K
>>>>>>>>>>> 
>>>>>>>>>>> As you can see there are some small memory footprint
>>>>>>>>>>> improvements on
>>>>>>>>>>> my branch that result from this work. The size of the Open
>>>>>>>>>>> MPI
>>>>>>>>>>> project
>>>>>>>>>>> shrinks a bit as well. This commit cuts between 3,500 and
>>>>>>>>>>> 2,000
>>>>>>>>>>> lines
>>>>>>>>>>> of code (depending on how you count) so about a ~1% code
>>>>>>>>>>> shrink.
>>>>>>>>>>> 
>>>>>>>>>>> The branch is stable in all of the testing I have done, but
>>>>>>>>>>> there
>>>>>>>>>>> are
>>>>>>>>>>> some platforms on which I cannot test. So please give this
>>>>>>>>>>> branch a
>>>>>>>>>>> try and let me know if you find any problems.
>>>>>>>>>>> 
>>>>>>>>>>> Cheers,
>>>>>>>>>>> Josh
>>>>>>>>>>> 
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> devel mailing list
>>>>>>>>>>> de...@open-mpi.org
>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>>> 
>>>>>>>>>> -- 
>>>>>>>>>> Jeff Squyres
>>>>>>>>>> Cisco Systems
>>>>>>>>>> 
>>>>>>>>>> _______________________________________________
>>>>>>>>>> devel mailing list
>>>>>>>>>> de...@open-mpi.org
>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>> _______________________________________________
>>>>>>>>> devel mailing list
>>>>>>>>> de...@open-mpi.org
>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>> 
>>>>>>>> _______________________________________________
>>>>>>>> devel mailing list
>>>>>>>> de...@open-mpi.org
>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>> 
>>>>>>> 
>>>>>>> -- 
>>>>>>> 
>>>>>>> - Pak Lui
>>>>>>> pak....@sun.com
>>>>>>> _______________________________________________
>>>>>>> devel mailing list
>>>>>>> de...@open-mpi.org
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> devel mailing list
>>>> de...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> 
>>> _______________________________________________
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
>> 
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


Reply via email to