Per the teleconf this week, Ralph and I worked up two new features
that we're nearly ready to put back in the trunk:
1. IBM+LANL needed a way to XML-ize all output that comes out of OMPI
so that 3rd party tools can parse and use it intelligently (e.g., the
PTP debugger can now distinguish between OMPI error messages and
stderr from the MPI app).
2. In order to do #1, we created separate logical channels (vs, just
throwing everything in stderr and letting IOF relay it back to the
HNP) for the following:
- stdout/stderr from the MPI app
- opal_show_help() messages (***)
- opal_output*() messages (***)
As a side effect, we now filter show_help() messages and only print
them *once* at the HNP (this has been a very long-standing goal of
mine). So if your MPI app barfs, you will no longer see the same
show_help() error message N times -- you'll see it only once, possibly
accompanied with a "...and we got the same error message from N other
processes" notice.
(***) To make both #1 and #2 work, we had to raise the abstraction
level. That is, there had to be job-level intelligence about the
different kinds of output. So we have created orte_output() (and
friends) and orte_show_help(). The OPAL variants still exist, but
they *SHOULD NOT BE USED* by the MPI layer. Specifically, the OPAL
variants are for what OPAL does best: single process stuff. The ORTE
variants provide the job-level intelligence, such as duplicate
show_help filtering, relaying to the HNP in a different channel than
IOF, etc.
So when this stuff hits the trunk, you'll see a ton of s/opal_output/
orte_output/g and /opal_show_help/orte_show_help/g changes throughout
the code base. Do not be alarmed. :-)
--
Jeff Squyres
Cisco Systems