I filed ticket #2019 pointing to this email thread in case someone ever wants to implement it.

FWIW: I don't think it matters much whether it's implemented as part of mpirun or a new executable; I suspect that whatever implementation is easiest will be fine.


On Sep 10, 2009, at 9:28 PM, Greg Watson wrote:

Hi Jeff,

I think that sums up the situation nicely. For item #2, I wonder if it
would be better to still use "ssh <host> mpirun ...", but have mpirun
fork itself "under the covers"? Not having an extra executable in your
distribution would probably make long term maintenance easier.

If Ralph can do anything in the 1.3/1.4 timeframe to sort out the few
remaining issues, it would be appreciated.

Regards,
Greg

On Sep 10, 2009, at 3:19 PM, Jeff Squyres wrote:

> Greg and I chatted on the phone about this.  I now understand much
> better about what he is trying to do (short version: Eclipse is
> running on one machine, it is opening an ssh session to a remote
> machine and launching mpirun on that remote machine).
>
> Results of the phone conversation (for the web archives):
>
> - In the short term, there's a few remaining issues to be figured
> out.  Ralph (who is now full-time at Cisco) may or may not have time
> to fix these in the near team.  We (Open MPI) would happily review
> patches from others in this area if a solution is required before
> Ralph can get to it.
>
> - In the long term, we came up with a "thinking outside the box"
> solution that seems to be *much* better (think 1.5 and beyond).
> I'll describe the scheme, but at the same time, I'll indicate that
> Cisco likely does not have time in the foreseeable future to
> implement it.  Again, we would be happy to provide guidance to
> anyone who would want to implement it (e.g., IBM) and/or review
> patches.
>
> -----
>
> 1. Currently, the Eclipse plugin is effectively executing "ssh
> <otherhost> mpirun ...".  This has several advantages:
>   - Use whatever the native OMPI is on <otherhost>
>   - No need for binary compatibility (i.e., version match of Eclipse
> plugin and remote OMPI installation)
>
> 2. The proposal is to change this to "ssh <otherhost> mpirun-
> proxy ..." where mpirun-proxy is a new executable that does the
> following:
> - fork/exec the real mpirun, making pipes to mpirun's stdin/ stdout/
> stderr
>   - tell mpirun to not display any IOF output from MPI processes
>   - tell mpirun to not display any show_help messages
>   - register to receive ORTE "events" (more below) via the ORTE comm
> library
>   - register to receive IOF from all the MPI processes via the ORTE
> comm library
>   - register to receive show_help messages from MPI processes via
> the ORTE comm library
>   - upon receipt of specific events (e.g., determination of host/
> node/process maps), output this data encased in a specific XML
> schema (e.g., a specific set of XML tags to encase each data item in
> the nodemap) to ssh's stdout
>   - read output from mpirun's stdout/stderr, output it on ssh's
> stdout, encased in <stdout> / <stderr> (etc.)
>   - read IOF from MPI processes and output them to ssh's stdout,
> encased in appropriate XML tagging
>   - read show_help messages from MPI processes and output them to
> ssh's stdout, encased in appropriate XML tagging
>
> --> Note that some of the above functionality already exists; its
> would just need to be marshaled together and used in some new
> logic.  Other parts of the functionality do not exist and would need
> to be written (e.g., redirecting show_help messages to something
> other than the HNP).
>
> 3. Once #2 is done, remove all the XML processing from mpirun,
> libopen-rte, libmpi, and all OMPI plugins (since it's now all in
> mpirun-proxy).
>
> -----
>
> This functionality would accomplish the following:
>
> - The code is distributed in Open MPI -- not Eclipse or an Eclipse
> plugin -- there's no additional compilation or linking step for the
> Eclipse plugin to talk to OMPI.
>
> - The Eclipse plugin, which already checks the output from
> ompi_info, can know when to use this new functionality (ssh mpirun-
> proxy instead of mpirun).
>
> - All the OMPI XML parsing can be centralized to the mpirun-proxy
> executable.  This is a *huge* improvement over having XML sprinkled
> all over the OMPI code base, as it is now.  Additionally, with this
> method, *all* OMPI output will be encased in XML before it is sent
> to the Eclipse plugin (via ssh's stdout).  Today, we have "XML-lite"
> functionality in that "most" of OMPI's output is XML-ified, but
> there's oodles and oodles of corner cases where output is *not* XML-
> ified.  The above proposal seems to be the best idea so far on how
> to address this issue in a holistic way (rather than adding a bunch
> more band-aids every time we find another output that isn't XML-
> ified).
>
>
>
>
>
> On Sep 10, 2009, at 9:23 AM, Greg Watson wrote:
>
>> The most appealing thing about the XML option is that it just works
>> "out of the box." Using a library API invariably requires compiling
>> an
>> agent or distributing pre-compiled binaries with all the associated
>> complications. We tried that in the dim past and it was pretty
>> unworkable. The other problem was that the API headers were not
>> installed by default, so users were forced to install local copies of
>> OMPI with development headers enabled. It was not a great end-user
>> experience.
>>
>> Greg
>>
>> On Sep 10, 2009, at 8:45 AM, Jeff Squyres wrote:
>>
>> > Thinking about this a little more ...
>> >
>> > This all seems like Open MPI-specific functionality for Eclipse.
>> If
>> > that's the case, don't we have an ORTE tools communication library
>> > that could be used?  IIRC, it pretty much does exactly what you
>> want
>> > and would be far less clumsy than trying to jury-rig sending XML
>> > down files/fd's/whatever.  I have dim recollections of the ORTE
>> > tools communication library API returning the data that you have
>> > asked for in data structures -- no parsing of XML at all (and, more
>> > importantly to us, no need to add all kinds of special code paths
>> > for wrapping our output in XML).
>> >
>> > If I'm right (and that's a big "if"!), is there a reason that this
>> > library is not attractive to you?
>> >
>> >
>> >
>> >
>> > On Sep 10, 2009, at 8:04 AM, Jeff Squyres wrote:
>> >
>> >> On Sep 9, 2009, at 12:17 PM, Ralph Castain wrote:
>> >>
>> >>> Hmmm....I never considered the possibility of output-filename
>> being
>> >>> used that way. Interesting idea!
>> >>>
>> >>
>> >> That feels way weird to me -- for example, how do you know that
>> >> you're actually outputting to a tty?
>> >>
>> >> FWIW: +1 on the idea of writing to numbered fd's passed on the
>> >> command line. It just "feels" like a more POSIX-ish way of doing >> >> things...? I guess I'm surprised that that would be difficult to
>> >> do from Java.
>> >>
>> >> --
>> >> Jeff Squyres
>> >> jsquy...@cisco.com
>> >>
>> >
>> >
>> > --
>> > Jeff Squyres
>> > jsquy...@cisco.com
>> >
>> > _______________________________________________
>> > devel mailing list
>> > de...@open-mpi.org
>> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>
>
> --
> Jeff Squyres
> jsquy...@cisco.com
>
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
jsquy...@cisco.com

Reply via email to