On 12/4/08 9:05 AM, "Jeff Squyres" <jsquy...@cisco.com> wrote:

> After reflecting on this a bit, there's two more things I should have
> mentioned:
> 
> 1. I think that moving the BTL's out into their own layer (or
> whatever) should be a separate effort than re-introducing the RSL (or
> something like it).  To me, they're two different things and should be
> addressed separately.
> 
>>> >> Yes, I agree, and this is the plan.  However, I think that some of these
>>> issues will
>>> >> arrise in the 4th proposed stage.  Specifically, indexing the btl¹s will
>>> become an
>>> >> issue at the pml level.  Right now we do what is easiest and just use an
>>> indexing
>>> >> already provided by orte, but this really should be driven by the layer
>>> using them,
>>> >> MPI in this case.
> 
> 2. With both the BTLs and some incarnation of RSL coming into the code
> base, we need to decide exactly what our policies will be on who can
> drive interface changes and what our responsibilities will be to
> external code bases that use the BTL and/or RSL interfaces.
> 
> FYI: Re-introducing the some form of the RSL is already on either the
> December or January ORTE meeting agenda (I don't remember which
> offhand).
> 
>>> >> yes.   I know that several people have been thinking about this.  Any
>>> date for the
>>> >> January meeting yet ?  It turns out that I will be out of the country for
>>> 3 weeks from 
>>> >> mid Jan until just before the Feb. MPI Forum meeting.
> 
> Rich
> 
> 
> 
> On Dec 4, 2008, at 8:13 AM, Jeff Squyres wrote:
> 
>> > I think you got it right.  And I think we're pretty good in terms of
>> > BTL usage of ORTE and OPAL (to include the new "notifier" service
>> > that Ralph put in recently -- what the FTB will likely eventually
>> > use, I think...?); those interfaces and abstraction barriers are
>> > technologically enforced.  If you break the abstractions, the linker
>> > will swiftly and unmercifully punish you.  (this was exactly [one
>> > of] the rationale that we used for splitting the code base into
>> > OPAL, ORTE, and OMPI several years ago)
>> >
>> > Greg has already noted on the wiki a few constants used in the BTL's
>> > that have an OMPI_ prefix that aren't really OMPI values (e.g.,
>> > OMPI_ENABLE_HETEROGENEOUS_SUPPORT).  These come from configure
>> > (i.e., opal/include/opal_config.h) and were not renamed back when we
>> > split the code base into OPAL, ORTE, and OMPI.  I don't think we had
>> > a strong reason for not renaming them -- most could probably be
>> > renamed to OPAL_* -- we just didn't do it then.  Perhaps they can be
>> > changed during the BTL extraction process (I noted this on the wiki).
>> >
>> >
>> >
>> > On Dec 3, 2008, at 9:43 PM, Richard Graham wrote:
>> >
>>> >> BTW,
>>> >>  I was guessing FTB is Fault Tolerant Backbone, but if not, can
>>> >> someone tell me what it is ?  If it is not the later, what I just
>>> >> wrote about it makes no sense.
>>> >>
>>> >> Rich
>>> >>
>>> >>
>>> >> On 12/3/08 9:34 PM, "Richard Graham" <rlgra...@ornl.gov> wrote:
>>> >>
>>>> >>> The goal is to use the btl¹s outside of the context of MPI, which
>>>> >>> was what was in mind from the day the ompi work started over five
>>>> >>> years ago, but with no other use at the time, things grew up
>>>> >>> intermingled ­ no surprise at all.  What we are attempting to do
>>>> >>> is to untangle the existing dependencies, and make a much cleaner
>>>> >>> distinction between how/what data is passed between layers.
>>>> >>>
>>>> >>> I expect this will involve some sort of well defined interface
>>>> >>> between the btl¹s and orte, and I don¹t know if this will also
>>>> >>> require something like this between the btl¹s and the pml ­ I
>>>> >>> think that interface is rigidly enforced, but am not sure.
>>>> >>>
>>>> >>> I expect that explicit calls to FTB in the btl layer would have to
>>>> >>> be componentized, especially in the context of what is developing
>>>> >>> in the FT working group of the MPI Forum.  Not that FTB is bad in
>>>> >>> any way, just that it is one of many monitors.
>>>> >>>
>>>> >>> We will need to talk about this on a case by case basis, and
>>>> >>> decide how to proceed.  If anyone wants to help, please do.
>>>> >>>
>>>> >>> Rich
>>>> >>>
>>>> >>>
>>>> >>> On 12/3/08 3:02 PM, "Ralph Castain" <r...@lanl.gov> wrote:
>>>> >>>
>>>>> >>>> I managed to execute the modex-less changes pretty much without
>>>>> >>>> introducing additional ORTE dependencies into the BTL's, though
>>>>> >>>> there
>>>>> >>>> may be some additions as we look a the other BTLs that I didn't
>>>>> >>>> address. So hopefully that won't contribute too much to the issue
>>>>> >>>> here.
>>>>> >>>>
>>>>> >>>> At the moment, I don't think it matters where notifier sits - it
>>>>> >>>> might
>>>>> >>>> be able to move to OPAL. Only catch will be if some notifier
>>>>> >>>> component
>>>>> >>>> requires communications. I'm thinking of FTB, for example, and
>>>>> >>>> our own
>>>>> >>>> local monitoring program that may require TCP messaging. We don't
>>>>> >>>> currently have anything in OPAL that would support an OPAL level
>>>>> >>>> messaging system, though perhaps that could be resolved.
>>>>> >>>>
>>>>> >>>> We also have dependencies where the BTL's will call orte_ess to
>>>>> >>>> find
>>>>> >>>> out what node another proc is on, the node local rank of that proc,
>>>>> >>>> etc. Those dependencies are likely to grow after the Dec meeting
>>>>> >>>> (see
>>>>> >>>> wiki for that agenda item), and definitely cannot be moved to OPAL.
>>>>> >>>>
>>>>> >>>> However, note that Rich stated the BTL's were -not- moving to OPAL.
>>>>> >>>> This begs the question: where -are- they going? Into their own
>>>>> >>>> layer?
>>>>> >>>> Will that layer be somewhere in-between OMPI and ORTE (in which
>>>>> >>>> case,
>>>>> >>>> the ORTE dependencies are moot)?
>>>>> >>>>
>>>>> >>>> I note that the wiki page doesn't address any of these questions,
>>>>> >>>> which is understandable if things are just getting underway. But it
>>>>> >>>> does sound like this is going to take some thought to ensure we
>>>>> >>>> don't
>>>>> >>>> paint ourselves into a corner.
>>>>> >>>>
>>>>> >>>> Ralph
>>>>> >>>>
>>>>> >>>>
>>>>> >>>> On Dec 3, 2008, at 12:10 PM, Jeff Squyres wrote:
>>>>> >>>>
>>>>>> >>>> > FWIW, I see lots of notifier calls being added to the BTLs (and
>>>>>> >>>> > elsewhere throughout the OMPI code base) over time...
>>>>>> >>>> >
>>>>>> >>>> > On Dec 3, 2008, at 2:07 PM, Tim Mattox wrote:
>>>>>> >>>> >
>>>>>>> >>>> >> The BTLs might have added calls to the notifier framework in
>>>>> >>>> their
>>>>>>> >>>> >> error paths.
>>>>>>> >>>> >> The notifier framework is currently in the ORTE layer... not
>>>>> >>>> sure
>>>>>>> >>>> >> if we could
>>>>>>> >>>> >> move it down to OPAL.  Ralph, any thoughts on that?
>>>>>>> >>>> >>
>>>>>>> >>>> >> On Wed, Dec 3, 2008 at 11:56 AM, Richard Graham
>>>>>>> <rlgra...@ornl.gov
>>>>>> >>>> >
>>>>>>> >>>> >> wrote:
>>>>>>>> >>>> >>> George told me about what he is doing, so no changes would be
>>>>>>>> >>>> >>> committed
>>>>>>>> >>>> >>> until George has his changes in.
>>>>>>>> >>>> >>>
>>>>>>>> >>>> >>> Are there other changes to the btl's that we should be aware
>>>>> >>>> of ?
>>>>>>>> >>>> >>>
>>>>>>>> >>>> >>> Rich
>>>>>>>> >>>> >>>
>>>>>>>> >>>> >>>
>>>>>>>> >>>> >>> On 12/3/08 11:47 AM, "George Bosilca" <bosi...@eecs.utk.edu>
>>>>> >>>> wrote:
>>>>>>>> >>>> >>>
>>>>>>>>> >>>> >>>> Terry,
>>>>>>>>> >>>> >>>>
>>>>>>>>> >>>> >>>> I'm involved [at some degree] in both efforts and I can
>>>>> >>>> confirm
>>>>>>>>> >>>> >>>> these
>>>>>>>>> >>>> >>>> two efforts will not affect each other in any bad way.
>>>>>>>>> >>>> >>>>
>>>>>>>>> >>>> >>>>  george.
>>>>>>>>> >>>> >>>>
>>>>>>>>> >>>> >>>> On Dec 3, 2008, at 11:42 , Terry Dontje wrote:
>>>>>>>>> >>>> >>>>
>>>>>>>>>> >>>> >>>>> I don't have any *strong* objections. However, I know that
>>>>> >>>> Eugene
>>>>>>>>>> >>>> >>>>> and George B have been working on some Fastpath code
changes
>>>>>>>>>> >>>> >>>>> that we
>>>>>>>>>> >>>> >>>>> should make sure neither project obliterates the other.
>>>>>>>>>> >>>> >>>>>
>>>>>>>>>> >>>> >>>>> --td
>>>>>>>>>> >>>> >>>>>
>>>>>>>>>> >>>> >>>>> Richard Graham wrote:
>>>>>>>>>>> >>>> >>>>>> Now that 1.3 will be released, we would like to go ahead
>>>>> >>>> with the
>>>>>>>>>>> >>>> >>>>>> plan to move the btl¹s out of the MPI layer. Greg Koenig
>>>>> >>>> who is
>>>>>>>>>>> >>>> >>>>>> doing most of the work has started a wiki page with
>>>>> >>>> details on
>>>>>>>>>>> >>>> >>>>>> the
>>>>>>>>>>> >>>> >>>>>> plans. Right now details are sketchy, as Greg is digging
>>>>> >>>> through
>>>>>>>>>>> >>>> >>>>>> the code, and has only hand written notes on data
>>>>> >>>> structures that
>>>>>>>>>>> >>>> >>>>>> need to be moved, include files that are not needed,
etc. 
>>>>> >>>> The
>>>>>>>>>>> >>>> >>>>>> page
>>>>>>>>>>> >>>> >>>>>> is at:
>>>>>>>>>>> >>>> >>>>>> _https://svn.open-mpi.org/trac/ompi/wiki/BTLExtraction_
>>>>>>>>>>> >>>> >>>>>>
>>>>>>>>>>> >>>> >>>>>> The first three steps basically only involve code
motion, 
>>>>> >>>> moving
>>>>>>>>>>> >>>> >>>>>> items such as ompi_list, and renaming them, moving where
>>>>> >>>> the code
>>>>>>>>>>> >>>> >>>>>> is actually located in the repository, and the like. For
>>>>> >>>> these we
>>>>>>>>>>> >>>> >>>>>> do not plan to put out a formal RFC, but comments are
very
>>>>>>>>>>> >>>> >>>>>> welcome,
>>>>>>>>>>> >>>> >>>>>> and any hands that are willing to help with this are
even 
>>>>> >>>> more
>>>>>>>>>>> >>>> >>>>>> welcome.
>>>>>>>>>>> >>>> >>>>>>
>>>>>>>>>>> >>>> >>>>>> The last phase where the btl¹s are made dependent on
OPAL, 
>>>>> >>>> and
>>>>>>>>>>> >>>> >>>>>> supporting libraries such as mpools I expect will be
>>>>> >>>> disruptive,
>>>>>>>>>>> >>>> >>>>>> and will definitely require an RFC, and will also be a
>>>>> >>>> longer
>>>>>>>>>>> >>>> >>>>>> process.
>>>>>>>>>>> >>>> >>>>>>
>>>>>>>>>>> >>>> >>>>>> Please send comments,
>>>>>>>>>>> >>>> >>>>>> Rich
>>>>>>>>>>> >>>> >>>>>>
>>>>> >>>> 
>>>>> ------------------------------------------------------------------------
>>>>>>>>>>> >>>> >>>>>>
>>>>>>>>>>> >>>> >>>>>> _______________________________________________
>>>>>>>>>>> >>>> >>>>>> devel mailing list
>>>>>>>>>>> >>>> >>>>>> de...@open-mpi.org
>>>>>>>>>>> >>>> >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>>>> >>>> >>>>>>
>>>>>>>>>> >>>> >>>>>
>>>>>>>>>> >>>> >>>>> _______________________________________________
>>>>>>>>>> >>>> >>>>> devel mailing list
>>>>>>>>>> >>>> >>>>> de...@open-mpi.org
>>>>>>>>>> >>>> >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>> >>>> >>>>
>>>>>>>>> >>>> >>>>
>>>>>>>>> >>>> >>>> _______________________________________________
>>>>>>>>> >>>> >>>> devel mailing list
>>>>>>>>> >>>> >>>> de...@open-mpi.org
>>>>>>>>> >>>> >>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>> >>>> >>>
>>>>>>>> >>>> >>>
>>>>>>>> >>>> >>> _______________________________________________
>>>>>>>> >>>> >>> devel mailing list
>>>>>>>> >>>> >>> de...@open-mpi.org
>>>>>>>> >>>> >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>> >>>> >>>
>>>>>>> >>>> >>
>>>>>>> >>>> >>
>>>>>>> >>>> >>
>>>>>>> >>>> >> --
>>>>>>> >>>> >> Tim Mattox, Ph.D. - http://homepage.mac.com/tmattox/
>>>>>>> >>>> >> tmat...@gmail.com || timat...@open-mpi.org
>>>>>>> >>>> >>   I'm a bright... http://www.the-brights.net/
>>>>>>> >>>> >>
>>>>>>> >>>> >> _______________________________________________
>>>>>>> >>>> >> devel mailing list
>>>>>>> >>>> >> de...@open-mpi.org
>>>>>>> >>>> >> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>> >>>> >
>>>>>> >>>> >
>>>>>> >>>> > --
>>>>>> >>>> > Jeff Squyres
>>>>>> >>>> > Cisco Systems
>>>>>> >>>> >
>>>>>> >>>> >
>>>>>> >>>> > _______________________________________________
>>>>>> >>>> > devel mailing list
>>>>>> >>>> > de...@open-mpi.org
>>>>>> >>>> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>> >>>>
>>>>> >>>>
>>>>> >>>> _______________________________________________
>>>>> >>>> devel mailing list
>>>>> >>>> de...@open-mpi.org
>>>>> >>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>> >>>>
>>>> >>>
>>>> >>> _______________________________________________
>>>> >>> devel mailing list
>>>> >>> de...@open-mpi.org
>>>> >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> >> _______________________________________________
>>> >> devel mailing list
>>> >> de...@open-mpi.org
>>> >> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> >
>> >
>> > --
>> > Jeff Squyres
>> > Cisco Systems
>> >
>> >
>> > _______________________________________________
>> > devel mailing list
>> > de...@open-mpi.org
>> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> 
> --
> Jeff Squyres
> Cisco Systems
> 
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 

Reply via email to