On the various stdout/err options: Other than the complexity of
telling the orteds which procs to do what with, there is nothing
standing in the way of implementing either of those capabilities. I
currently support only sending the info on a per-job basis, though, so
it would mean changing several ORTE-level global object definitions to
store the data, plus changing the launch msg to tell the orteds what
to do. So it wouldn't be a "5-min" kind of job, but it -could- be
done....if someone really, really wanted to.
I have a prototype "orte-iof" tool that does what you describe...will
come into trunk with these changes.
On Aug 28, 2008, at 7:44 AM, Jeff Squyres wrote:
On Aug 28, 2008, at 6:37 AM, Ralph Castain wrote:
1. specify which procs are to receive stdin. The options that
were to be supported are: all procs, a specific proc, or no
procs. The default will be rank=0 only. All procs not included
will have their stdin tied to /dev/null - which means a debugger
could not attach to the stdin at a later time.
How about: --stdin <list>, where <list> is a comma-delimited list
of integer ranges? Such as:
--stdin 0 (same as default)
--stdin 0,1 (procs 0 and 1 get stdin)
--stdin 0-9 (proc 0 through 9 get stdin)
--stdin 0-9,23-25 (procs 0 through 9 and 23 through 25 get stdin)
--stdin all (all procs get stdin)
--stdin none (no procs get stdin)
Just to be clear: is this something that is necessary, or are we
providing flexibility that nobody will ever use? Frankly, I'm told
that reading stdin at all is pretty rare, at least on jobs around
here, though I don't dispute having at least the all, one, or none
capability. But is anyone really going to pick-and-choose multiple
random procs to receive stdin?
I'm asking mostly because of the complexity it adds. Certainly, it
is doable - just wondering if it is worth the effort, or something
that will never be used.
Ah -- I actually mis-read your original comment. I'm happy with
all, none, and X (where X is a single integer).
"Go for the gold" would be the <list> syntax, but I agree that
that's not really necessary. I think it's definitely in the "would
be nice" category.
It occurs to me that we're using this <list> kind of notation in a
few places now (aren't we?). Perhaps we should have this string-
parsing code down in opal somewhere...?
Processing it is so trivial it probably doesn't merit a dedicated
code - all you do is use opal_argv_split and run down the list.
I was thinking of the ranges -- there's additional processing for
the X-Y strings. But it's moot point.
2. specify which stdxxx file descriptors you want left open on
your procs. Our defaults are to leave stdout/stderr/stddiag open
on all procs. This option would allow the user to specify that we
tie any or all of these to /dev/null
How about --stdout and --stderr, indicating which procs' stdout/
stderr you want to see? FWIW, I don't think we should provide a
way to turn off stddiag. The syntax could be just like --stdin,
except the default values would be "all".
Again, will anyone ever really use this? I agree about stddiag as
orte_show_help flows over it. I haven't found any interest around
here in shutting off stdout and/or stderr - nobody can think of a
reason to do so. Doing it is trivial - my concern here is solely
with the complexity of providing such fine-grained specifications.
It may actually be useful to turn off stdout/stderr in debugging
scenarios, meaning "I only want to see output from proc X, Y, Z."
How about leading these options off for now, but leaving the design
open to implementing them someday if someone ever cares enough?
Are these options per app context, or global? It would be
awesome to be per-app-context, but I wouldn't cry too hard if
they were global (especially if it meant making the code overly
complex, etc.).
My first reaction is that making this per app_context would create
a ton of complexity...but I'll take a gander before committing one
way or the other. Again, though, I would wonder if anyone really is
going to use this on a per app_context basis - or are we just
creating capability "because we can"?
I think it's solidly in the "because we can" department. If it's
anything more than trivial to implement, my $0.02 is to leave it
off. If someone wants to implement it someday, they can.
FWIW: as long as there's the possibility of writing an orte-iof
command line tool to suck down an individual proc's stdin/stdout/
stderr[/stddiag], I'm happy (because leaves the door open for
"mpirun --screen ...").
--
Jeff Squyres
Cisco Systems
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel