Per the teleconf this week, Ralph and I worked up two new features that we're nearly ready to put back in the trunk:

1. IBM+LANL needed a way to XML-ize all output that comes out of OMPI so that 3rd party tools can parse and use it intelligently (e.g., the PTP debugger can now distinguish between OMPI error messages and stderr from the MPI app).

2. In order to do #1, we created separate logical channels (vs, just throwing everything in stderr and letting IOF relay it back to the HNP) for the following:
   - stdout/stderr from the MPI app
   - opal_show_help() messages (***)
   - opal_output*() messages (***)
As a side effect, we now filter show_help() messages and only print them *once* at the HNP (this has been a very long-standing goal of mine). So if your MPI app barfs, you will no longer see the same show_help() error message N times -- you'll see it only once, possibly accompanied with a "...and we got the same error message from N other processes" notice.

(***) To make both #1 and #2 work, we had to raise the abstraction level. That is, there had to be job-level intelligence about the different kinds of output. So we have created orte_output() (and friends) and orte_show_help(). The OPAL variants still exist, but they *SHOULD NOT BE USED* by the MPI layer. Specifically, the OPAL variants are for what OPAL does best: single process stuff. The ORTE variants provide the job-level intelligence, such as duplicate show_help filtering, relaying to the HNP in a different channel than IOF, etc.

So when this stuff hits the trunk, you'll see a ton of s/opal_output/ orte_output/g and /opal_show_help/orte_show_help/g changes throughout the code base. Do not be alarmed. :-)

--
Jeff Squyres
Cisco Systems

Reply via email to