Hi Jeff,
I think that sums up the situation nicely. For item #2, I wonder if it
would be better to still use "ssh <host> mpirun ...", but have mpirun
fork itself "under the covers"? Not having an extra executable in your
distribution would probably make long term maintenance easier.
If Ralph can do anything in the 1.3/1.4 timeframe to sort out the few
remaining issues, it would be appreciated.
Regards,
Greg
On Sep 10, 2009, at 3:19 PM, Jeff Squyres wrote:
Greg and I chatted on the phone about this. I now understand much
better about what he is trying to do (short version: Eclipse is
running on one machine, it is opening an ssh session to a remote
machine and launching mpirun on that remote machine).
Results of the phone conversation (for the web archives):
- In the short term, there's a few remaining issues to be figured
out. Ralph (who is now full-time at Cisco) may or may not have time
to fix these in the near team. We (Open MPI) would happily review
patches from others in this area if a solution is required before
Ralph can get to it.
- In the long term, we came up with a "thinking outside the box"
solution that seems to be *much* better (think 1.5 and beyond).
I'll describe the scheme, but at the same time, I'll indicate that
Cisco likely does not have time in the foreseeable future to
implement it. Again, we would be happy to provide guidance to
anyone who would want to implement it (e.g., IBM) and/or review
patches.
-----
1. Currently, the Eclipse plugin is effectively executing "ssh
<otherhost> mpirun ...". This has several advantages:
- Use whatever the native OMPI is on <otherhost>
- No need for binary compatibility (i.e., version match of Eclipse
plugin and remote OMPI installation)
2. The proposal is to change this to "ssh <otherhost> mpirun-
proxy ..." where mpirun-proxy is a new executable that does the
following:
- fork/exec the real mpirun, making pipes to mpirun's stdin/stdout/
stderr
- tell mpirun to not display any IOF output from MPI processes
- tell mpirun to not display any show_help messages
- register to receive ORTE "events" (more below) via the ORTE comm
library
- register to receive IOF from all the MPI processes via the ORTE
comm library
- register to receive show_help messages from MPI processes via
the ORTE comm library
- upon receipt of specific events (e.g., determination of host/
node/process maps), output this data encased in a specific XML
schema (e.g., a specific set of XML tags to encase each data item in
the nodemap) to ssh's stdout
- read output from mpirun's stdout/stderr, output it on ssh's
stdout, encased in <stdout> / <stderr> (etc.)
- read IOF from MPI processes and output them to ssh's stdout,
encased in appropriate XML tagging
- read show_help messages from MPI processes and output them to
ssh's stdout, encased in appropriate XML tagging
--> Note that some of the above functionality already exists; its
would just need to be marshaled together and used in some new
logic. Other parts of the functionality do not exist and would need
to be written (e.g., redirecting show_help messages to something
other than the HNP).
3. Once #2 is done, remove all the XML processing from mpirun,
libopen-rte, libmpi, and all OMPI plugins (since it's now all in
mpirun-proxy).
-----
This functionality would accomplish the following:
- The code is distributed in Open MPI -- not Eclipse or an Eclipse
plugin -- there's no additional compilation or linking step for the
Eclipse plugin to talk to OMPI.
- The Eclipse plugin, which already checks the output from
ompi_info, can know when to use this new functionality (ssh mpirun-
proxy instead of mpirun).
- All the OMPI XML parsing can be centralized to the mpirun-proxy
executable. This is a *huge* improvement over having XML sprinkled
all over the OMPI code base, as it is now. Additionally, with this
method, *all* OMPI output will be encased in XML before it is sent
to the Eclipse plugin (via ssh's stdout). Today, we have "XML-lite"
functionality in that "most" of OMPI's output is XML-ified, but
there's oodles and oodles of corner cases where output is *not* XML-
ified. The above proposal seems to be the best idea so far on how
to address this issue in a holistic way (rather than adding a bunch
more band-aids every time we find another output that isn't XML-
ified).
On Sep 10, 2009, at 9:23 AM, Greg Watson wrote:
The most appealing thing about the XML option is that it just works
"out of the box." Using a library API invariably requires compiling
an
agent or distributing pre-compiled binaries with all the associated
complications. We tried that in the dim past and it was pretty
unworkable. The other problem was that the API headers were not
installed by default, so users were forced to install local copies of
OMPI with development headers enabled. It was not a great end-user
experience.
Greg
On Sep 10, 2009, at 8:45 AM, Jeff Squyres wrote:
> Thinking about this a little more ...
>
> This all seems like Open MPI-specific functionality for Eclipse.
If
> that's the case, don't we have an ORTE tools communication library
> that could be used? IIRC, it pretty much does exactly what you
want
> and would be far less clumsy than trying to jury-rig sending XML
> down files/fd's/whatever. I have dim recollections of the ORTE
> tools communication library API returning the data that you have
> asked for in data structures -- no parsing of XML at all (and, more
> importantly to us, no need to add all kinds of special code paths
> for wrapping our output in XML).
>
> If I'm right (and that's a big "if"!), is there a reason that this
> library is not attractive to you?
>
>
>
>
> On Sep 10, 2009, at 8:04 AM, Jeff Squyres wrote:
>
>> On Sep 9, 2009, at 12:17 PM, Ralph Castain wrote:
>>
>>> Hmmm....I never considered the possibility of output-filename
being
>>> used that way. Interesting idea!
>>>
>>
>> That feels way weird to me -- for example, how do you know that
>> you're actually outputting to a tty?
>>
>> FWIW: +1 on the idea of writing to numbered fd's passed on the
>> command line. It just "feels" like a more POSIX-ish way of doing
>> things...? I guess I'm surprised that that would be difficult to
>> do from Java.
>>
>> --
>> Jeff Squyres
>> jsquy...@cisco.com
>>
>
>
> --
> Jeff Squyres
> jsquy...@cisco.com
>
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
--
Jeff Squyres
jsquy...@cisco.com
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel