Hi Ralph,
I didn't realize it would be such a problem. Unfortunately there is
simply no way to reliably parse this kind of output, because it is
impossible to know what the error messages are going to be, and
presumably they could include XML-like formatting as well. The whole
point of the XML was to try and simplify the parsing of the mpirun
output, but it now looks like it's actually more difficult.
I seem to remember that you said that the XML between <map> and </map>
is always correctly formatted. I think the only feasible approach for
XML mode now is:
1. Drop the <mpirun> and </mpirun> tags.
2. Keep everything between <map> and </map> as is.
3. Drop the <stdout>, <stderr>, and <stddiag> tags and just use free
format for program output and errors.
4. Go back to using stdout for program output, and stderr for errors.
I will just ignore everything before <map> and after </map>, and send
stdout and stderr (minus the text between <map> and </map>) to a
console so the user can see what happened when the job run.
I think this was the situation in an earlier version (1.3.0?)
Thanks for your patience,
Greg
On Aug 27, 2009, at 10:44 PM, Ralph Castain wrote:
Hi Greg
I fixed these so they will get properly formatted. However, it is
symptomatic of a much broader problem - namely, that developers have
inserted print statements throughout the code for reporting errors.
There simply isn't any easy way for me to catch them all.
Jeff and I have talked about ways of approaching that problem.
However, nothing is entirely perfect. For example, an error detected
by slurm will generate a message that lies completely outside OMPI's
scope, and will be asynchronous with anything we try to report.
Thus, you are going to have to always be prepared to deal with
improperly formatted messages. For example, you could easily get the
following garbled output:
<stdSLURM-GENERATED-ERROR-MESSAGE
err>mpirun was unable to stSLURM-GENERATED-ERROR-MESSAGE
art the job&010;<SHELL-ERROR-MESSAGE
/stdANOTHER-ERROR
err>
You get the picture, I'm sure. There is nothing I can do about this,
so your system is simply going to have to figure out how to handle it.
Only other solution I can propose is going back to building against
the tool library I created, but that has its own issues too...
Ralph
Date: August 25, 2009 9:23:00 AM MDT
To: Open MPI Developers <de...@open-mpi.org>
Subject: Re: [OMPI devel] XML request
Reply-To: Open MPI Developers <de...@open-mpi.org>
Ralph,
Looks like some messages are taking a different path:
$ mpirun -mca orte_show_resolved_nodenames 1 -xml -display-map -np
3 xxx
<mpirun>
<map>
<host name="Jarrah.local" slots="1" max_slots="0">
<process rank="0"/>
<process rank="1"/>
<process rank="2"/>
</host>
</map>
<
stderr
>
--------------------------------------------------------------------------

;</stderr>
<stderr>mpirun was unable to launch the specified application as it
could not find an executable:
</stderr>
<stderr>
</stderr>
<stderr>Executable: xxx
</stderr>
<stderr>Node: Jarrah.local
</stderr>
<stderr>
</stderr>
<stderr>while attempting to start process rank 0.
</stderr>
<
stderr
>
--------------------------------------------------------------------------

;</stderr>
3 total processes failed to start
</mpirun>
Cheers,
Greg
On Aug 20, 2009, at 3:24 PM, Ralph Castain wrote:
Okay - try r21858.
Ralph
On Aug 20, 2009, at 12:36 PM, Greg Watson wrote:
Hi Ralph,
Cool!
Regarding the scope of the tags, I never really thought about
output from the command itself. I propose that any output that
can't otherwise be classified be sent using the appropriate
<stdout> or <stderr> tags with no "rank" attribute.
Cheers,
Greg
On Aug 20, 2009, at 1:52 PM, Ralph Castain wrote:
Hi Greg
I can catch most of these and will do so as they flow through a
single code path. However, there are places sprinkled throughout
the code where people directly output warning and error info -
these will be more problematic and represent a degree of change
that is probably outside the comfort zone for the 1.3 series.
After talking with Jeff about it, we propose that I make the
simple change that will catch messages like those below. For
the broader problem, we believe that some discussion with you
about the degree of granularity exposed through the xml output
might help define the overall solution. For example, can we just
label all stderr messages with <stderr></stderr> tags, or do you
need more detailed tagging (e.g., rank, file, line, etc.)?
That discussion can occur later - for now, I'll catch these.
Will let you know when it is ready to test!
Ralph
On Aug 20, 2009, at 11:16 AM, Greg Watson wrote:
Ralph,
One more thing. Even with XML enabled, I notice that some error
messages are still sent to stderr without XML tags (see below.)
Any chance these could be sent to stdout wrapped in <stderr></
stderr> tags?
Thanks,
Greg
$ mpirun -mca orte_show_resolved_nodenames 1 -xml -display-map -
np 1 ./pop pop_in
<mpirun>
<map>
<host name="4pcnuggets" slots="1" max_slots="0">
<process rank="0"/>
</host>
</map>
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode 0.
NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI
processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
<stdout
rank
=
"0
">
------------------------------------------------------------------------

;</stdout>
<stdout rank="0"> 
</stdout>
<stdout rank="0"> Parallel Ocean Program (POP) 
</stdout>
<stdout rank="0"> Version 2.0.1 Released 21 Jan 2004
</
stdout>
<stdout rank="0"> 
</stdout>
<stdout
rank
=
"0
">
------------------------------------------------------------------------

;</stdout>
<stdout
rank
=
"0
">
------------------------------------------------------------------------

;</stdout>
<stdout rank="0"> 
</stdout>
<stdout rank="0">POP aborting...
</stdout>
<stdout rank="0"> Input nprocs not same as system
request
</stdout>
<stdout rank="0"> 
</stdout>
<stdout
rank
=
"0
">
------------------------------------------------------------------------

;</stdout>
--------------------------------------------------------------------------
mpirun has exited due to process rank 0 with PID 15201 on
node 4pcnuggets exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------
On Aug 19, 2009, at 10:48 AM, Greg Watson wrote:
Ralph,
Looks like it's working now.
Thanks,
Greg
On Aug 18, 2009, at 5:21 PM, Ralph Castain wrote:
Give r21836 a try and see if it still gets out of order.
Ralph
On Aug 18, 2009, at 2:18 PM, Greg Watson wrote:
Ralph,
Not sure that's it because all XML output should be via
stdout.
Greg
On Aug 18, 2009, at 3:53 PM, Ralph Castain wrote:
Hmmm....let me try adding a fflush after the <mpirun>
output to force it out. Best guess is that you are seeing a
little race condition - the map output is coming over
stderr, while the <mpirun> tag is coming over stdout.
On Tue, Aug 18, 2009 at 12:53 PM, Greg Watson <g.wat...@computer.org
> wrote:
Hi Ralph,
I'm seeing something strange. When I run "mpirun -mca
orte_show_resolved_nodenames 1 -xml -display-map...", I see:
<mpirun>
<map>
<host name="Jarrah.local" slots="1" max_slots="0">
<process rank="0"/>
<process rank="1"/>
<process rank="2"/>
<process rank="3"/>
</host>
</map>
...
</mpirun>
but when I run " ssh localhost mpirun -mca
orte_show_resolved_nodenames 1 -xml -display-map...", I see:
<map>
<host name="Jarrah.local" slots="1" max_slots="0">
<process rank="0"/>
<process rank="1"/>
<process rank="2"/>
<process rank="3"/>
</host>
</map>
<mpirun>
...
</mpirun>
Any ideas?
Thanks,
Greg
On Aug 17, 2009, at 11:16 PM, Ralph Castain wrote:
Should be done on trunk with r21826 - would you please give
it a try and let me know if that meets requirements? If so,
I'll move it to 1.3.4.
Thanks
Ralph
On Aug 17, 2009, at 6:42 AM, Greg Watson wrote:
Hi Ralph,
Yes, you'd just need issue the start tag prior to any other
XML output, then the end tag when it's guaranteed all XML
other output has been sent.
Greg
On Aug 17, 2009, at 7:44 AM, Ralph Castain wrote:
All things are possible - some just a tad more painful than
others.
It looks like you want the mpirun tags to flow around all
output during the run - i.e., there is only one pair of
mpirun tags that surround anything that might come out of
the job. True?
If so, that would be trivial.
On Aug 14, 2009, at 9:25 AM, Greg Watson wrote:
Ralph,
Would it be possible to get mpirun to issue start and end
tags if the -xml option is used? Currently there is no way
to determine when the output starts and finishes, which
makes parsing the XML tricky, particularly if something
else generates output (e.g. the shell). Something like this
would be ideal:
<mpirun>
<map>
...
</map>
<stdout>...</stdout>
<stderr>...</stderr>
</mpirun>
If we could get it in 1.3.4 even better. :-)
Thanks,
Greg
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
From: Greg Watson <g.wat...@computer.org>
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel