I should have expounded further...

We actually looked at using the ODLS, and at creating a new opal_fork
capability that could perhaps be shared between the ODLS and this point in
the code. Unfortunately, neither option worked very well.

In the ODLS case, we had to pepper it with if statements to reflect the
differences in available services and information.

In the case of opal_fork, we encountered a similar problem - the info
available, what had to be passed to the fork'd child, etc. were all so
different that we wound up just writing a big "if" at the top and having
totally separate code paths.

That's why we wound up where we are. Remember, the ODLS fork/exec's
application procs, so it includes all kinds of stuff for that purpose. In
this case, we are fork/exec'ing an orted - totally different informational

On 7/12/07 2:17 PM, "Ralph H Castain" <r...@lanl.gov> wrote:

> I don't think so - the decision to fork must come earlier, before that
> framework can be selected. At the time of the fork, we don't have access to
> very much in terms of services.
> You are welcome to look and see if you can find a way to do it. The
> fork/exec occurs in orte/mca/sds/base/sds_base_universe.c, which is called
> just before we define our name during orte_init_stage1
> Ralph
> On 7/12/07 2:12 PM, "George Bosilca" <bosi...@cs.utk.edu> wrote:
>> We have the ODLS framework which is supposed to launch local
>> processes. Can we use it in order to spawn the local daemons ? This
>> will solve the Windows problem, and will give us a more consistent
>> environment.
>>    george.
>> On Jul 12, 2007, at 4:02 PM, Ralph H Castain wrote:
>>> The commit has been made - it is r15390.
>>> This commit restored the ability to execute singletons and singleton
>>> comm_spawn, both in single node and multi-node environments. It also
>>> includes a first step in our plan to reduce the ORTE system to the
>>> minimum
>>> functionality required to support Open MPI (more on that separately).
>>> Short description of major changes:
>>> 1. singletons now fork/exec a local daemon to manage their
>>> operations. This
>>> was required not only to resolve the current problem, but also to
>>> deal with
>>> threading issues in the progress engine down the road.
>>> 2. the orte daemon code now resides in libopen-rte. This was needed
>>> so that
>>> mpirun could fully provide all daemon services since we no longer
>>> allow
>>> multiple daemons to share a node (so an orted could not co-reside with
>>> mpirun).
>>> 3. daemons no longer use the orte triggering system during startup.
>>> Instead,
>>> they directly call back to their parent pls component to report
>>> ready to
>>> operate.
>>> I have modified all the pls components except xcpu and poe (don't
>>> understand
>>> either well enough to do it). Full functionality has been verified
>>> for rsh,
>>> SLURM, and TM systems. Compile has been verified for xgrid and
>>> gridengine,
>>> and hopefully those environments will work - though I could not
>>> verify that
>>> was true.
>>> Note that singletons will *not* operate in Windows environments at
>>> this
>>> time. The ability to fork/exec the local daemon would need to be added
>>> first, assuming Windows can support singletons (I honestly don't
>>> know).
>>> Please let me know of any problems.
>>> Ralph
>>> On 7/12/07 1:45 PM, "Ralph H Castain" <r...@lanl.gov> wrote:
>>>> Yo folks
>>>> Several of us are stuck waiting for this commit to hit. Rather
>>>> than wasting
>>>> the next several hours, I'm going to make the commit now.
>>>> So please be advised: if you do an update after this commit hits,
>>>> you will
>>>> need to autogen. You may want to wait until a convenient time
>>>> before doing
>>>> the update.
>>>> Thanks
>>>> Ralph
>>>> On 7/12/07 7:53 AM, "Ralph H Castain" <r...@lanl.gov> wrote:
>>>>> Yo all
>>>>> I have a fairly significant change coming to the orte part of the
>>>>> code base
>>>>> that will require an autogen (sorry). I'll check it in late this
>>>>> afternoon
>>>>> (can't do it at night as it is on my office desktop).
>>>>> The commit will fix the singleton operations, including singleton
>>>>> comm_spawn. It also takes the first step towards removing event-
>>>>> driven
>>>>> operations, replacing them with more serial code (to be explained
>>>>> separately). As part of all this, I had to modify the various pls
>>>>> components. For those I could not compile, I made a first cut at
>>>>> them that
>>>>> should (hopefully) allow them to continue to operate.
>>>>> Any of you using TM: we discovered that the trunk is not working
>>>>> currently
>>>>> on that environment. We are investigating - it has nothing to do
>>>>> with this
>>>>> commit, but predates it.
>>>>> Just wanted to give you a heads-up. Please refrain from making
>>>>> changes to
>>>>> the orte codebase today, if you could - it would simplify the
>>>>> commit and
>>>>> ensure we don't lose your changes.
>>>>> Thanks
>>>>> Ralph
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> de...@open-mpi.org
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>> _______________________________________________
>>>> devel mailing list
>>>> de...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> _______________________________________________
>>> devel-core mailing list
>>> devel-c...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel-core
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

Reply via email to