We have the ODLS framework which is supposed to launch local
processes. Can we use it in order to spawn the local daemons ? This
will solve the Windows problem, and will give us a more consistent
environment.
george.
On Jul 12, 2007, at 4:02 PM, Ralph H Castain wrote:
The commit has been made - it is r15390.
This commit restored the ability to execute singletons and singleton
comm_spawn, both in single node and multi-node environments. It also
includes a first step in our plan to reduce the ORTE system to the
minimum
functionality required to support Open MPI (more on that separately).
Short description of major changes:
1. singletons now fork/exec a local daemon to manage their
operations. This
was required not only to resolve the current problem, but also to
deal with
threading issues in the progress engine down the road.
2. the orte daemon code now resides in libopen-rte. This was needed
so that
mpirun could fully provide all daemon services since we no longer
allow
multiple daemons to share a node (so an orted could not co-reside with
mpirun).
3. daemons no longer use the orte triggering system during startup.
Instead,
they directly call back to their parent pls component to report
ready to
operate.
I have modified all the pls components except xcpu and poe (don't
understand
either well enough to do it). Full functionality has been verified
for rsh,
SLURM, and TM systems. Compile has been verified for xgrid and
gridengine,
and hopefully those environments will work - though I could not
verify that
was true.
Note that singletons will *not* operate in Windows environments at
this
time. The ability to fork/exec the local daemon would need to be added
first, assuming Windows can support singletons (I honestly don't
know).
Please let me know of any problems.
Ralph
On 7/12/07 1:45 PM, "Ralph H Castain" <r...@lanl.gov> wrote:
Yo folks
Several of us are stuck waiting for this commit to hit. Rather
than wasting
the next several hours, I'm going to make the commit now.
So please be advised: if you do an update after this commit hits,
you will
need to autogen. You may want to wait until a convenient time
before doing
the update.
Thanks
Ralph
On 7/12/07 7:53 AM, "Ralph H Castain" <r...@lanl.gov> wrote:
Yo all
I have a fairly significant change coming to the orte part of the
code base
that will require an autogen (sorry). I'll check it in late this
afternoon
(can't do it at night as it is on my office desktop).
The commit will fix the singleton operations, including singleton
comm_spawn. It also takes the first step towards removing event-
driven
operations, replacing them with more serial code (to be explained
separately). As part of all this, I had to modify the various pls
components. For those I could not compile, I made a first cut at
them that
should (hopefully) allow them to continue to operate.
Any of you using TM: we discovered that the trunk is not working
currently
on that environment. We are investigating - it has nothing to do
with this
commit, but predates it.
Just wanted to give you a heads-up. Please refrain from making
changes to
the orte codebase today, if you could - it would simplify the
commit and
ensure we don't lose your changes.
Thanks
Ralph
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
_______________________________________________
devel-core mailing list
devel-c...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel-core