On Nov 19, 2007, at 10:01 AM, Ralph H Castain wrote:

Unfortunately, it -is- a significant problem when passing the params on to the orteds, as Tim has eloquently pointed out in his original posting. I
guess I don't see why it would be a significant problem to just have
opal_cmd_line_parse do the "right thing" - if a string param is quoted, then
just remove the quotes.

How would opal_cmd_line_parse know when to remove quotes and when not remove quotes? I.e., what if an MCA param (or some other value, such as a user application argv) required quotes? This is not uncommon for the wrapper compilers, for example (passing -D's to the C preprocessor that take string arguments).

AFAIK, opal_cmd_line_parse() does exactly what I intended it to. :-) It performs no shell-like quoting because it expects to be given a list of "final" string tokens.

Problem solved, and I wouldn't have to add a bunch of code to figure out
when to assemble argv arrays in different ways.

Er... I don't understand: I joined this thread because I thought there was still something to fix...?

I would be against changing opal_cmd_line_parse() in the manner that you have described because it will break other things (e.g., the wrapper compilers).





On 11/19/07 7:52 AM, "Jeff Squyres" <jsquy...@cisco.com> wrote:

I guess I don't see why this is an opal_cmd_line_parse() problem.

If you invoke executables through system(), then a shell is used and
quoting is necessary to preserve the individual string tokens (i.e.,
"my beloved string" would be passed to the application as one argv
token, without the quotes).

But if you're building up an array of argv and calling some form of
exec(), then no shell is involved and quoting should not be
necessary.  Specifically:

    opal_append_argv(&argc, &argv, "my beloved string");

will be passed as one string token to the application.

opal_cmd_line_parse() is passed an array of argv, meaning that the
command line have already been split into individual string tokens. I
guess the question is whether these come in directly from the
arguments to main() or whether we are getting a single string and
breaking it up into tokens. If the latter is true, then we need to re-
evaluate our break-into-tokens algorithm.  I have a dim recollection
that these come in from the arguments to main(), though.

I guess I can see where this would get complicated for rsh/ssh
invocations, because *both* models are used. (i.e., you exec locally
but it turns into a system-like invocation on the remote side).  In
this case, I think you'll need to quote extended strings (e.g., those
containing spaces) for the non-local invocations not not quote it for
local invocations.




On Nov 19, 2007, at 9:19 AM, Ralph H Castain wrote:

My, you are joining late! ;-)

The problem is with MCA params that take string arguments. If we
pass the following:

-mca foo "my beloved string"

on the command line of an orted, we get a value for foo that
includes the quote marks. I verified this rather painfully when
attempting to pass a command line mca param for a uri. I eventually
had to add specific code to clean the paramater up.

This appears to be somehow related to the precise method used to
register the param. For example, the following deprecated method
works fine:

On the setup end:

   opal_argv_append(argc, argv, "--gprreplica");
   if (NULL != orte_process_info.gpr_replica_uri) {
       contact_info = strdup(orte_process_info.gpr_replica_uri);
   } else {
       contact_info = orte_rml.get_contact_info();
   }
   asprintf(&param, "\"%s\"", contact_info);
   opal_argv_append(argc, argv, param);

And on the receiving end:
   id = mca_base_param_register_string("gpr", "replica", "uri",
NULL, orte_process_info.gpr_replica_uri);
   mca_base_param_lookup_string(id,
&(orte_process_info.gpr_replica_uri));
   mca_base_param_set_internal(id, true);


However, the following does NOT work cleanly:

On the setup end:
  rml_uri = orte_rml.get_contact_info();
   asprintf(&param, "\"%s\"", rml_uri);
   opal_argv_append(argc, argv, "--hnp-uri");
   opal_argv_append(argc, argv, param);

On the receiving end:
   mca_base_param_reg_string_name("orte", "hnp_uri",
                                  "HNP contact info",
                                  true, false, NULL,  &uri);

Thereby necessitating the addition of the following code to clean it
up:
   if (NULL != uri) {
       /* the uri value passed to us will have quote marks around
it to protect
       * the value if passed on the command line. We must remove
those
       * to have a correct uri string
       */
      if ('"' == uri[0]) {
           /* if the first char is a quote, then so will the last
one be */
          ptr = &uri[1];
           len = strlen(ptr) - 1;
       } else {
           ptr = &uri[0];
           len = strlen(uri);
       }

       /* we have to copy the string by hand as strndup is a GNU
extension
        * and may not be generally available
        */
      orte_process_info.my_hnp_uri = (char*)malloc(len+1);
       for (i=0; i < len; i++) {
           orte_process_info.my_hnp_uri[i] = ptr[i];
       }
       orte_process_info.my_hnp_uri[len] = '\0';  /* NULL terminate
*/
      free(uri);

   }

It was my understanding that you wanted us to move away from the
deprecated interface – hence my comment that we cannot just quote
all the strings as we would have to add this code all over the
place, or (better) fix opal_cmd_line_parse.

Hope that helps
Ralph


On 11/19/07 7:01 AM, "Jeff Squyres" <jsquy...@cisco.com> wrote:

Sorry -- I'm just joining this conversation late: what's the problem
with opal_cmd_line_parse?

It should obey all quoting from shells, etc.  I.e., it shouldn't
care
about tokens with special characters (to include spaces) because the shell divides all of that stuff up -- it just gets a char*[] that it
treats as discrete tokens.

Is it doing something wrong?



On Nov 19, 2007, at 8:39 AM, Ralph H Castain wrote:

I'm not sure it is really necessary - the problem is solely within
opal_cmd_line_parse and (if someone can parse that code ;-)) is
truly simple
to fix. The overly long cmd line issue is due to a bug that Josh
was
going
to look at (may already have done so while I was out of touch).

Ralph



On 11/9/07 5:10 AM, "Jeff Squyres" <jsquy...@cisco.com> wrote:

Should there be another option for passing MCA parameters between
processes, such as via stdin (or any file descriptor)?  I.e.,
during
the command line parsing to check for command line MCA params,
perhaps
a new argument could be introduced: -mcauri <uri>, where <uri>
could
be a few different forms:

- file://stdin: (note the 2 //, not 3, so "stdin" would never
conflict
with a real file named /stdin)  Read the parameters in off stdin.

- rml://...rml contact info...: read in the MCA params via the RML
(although I assume that reading via the RML would be *waaaay* to
late
during the MCA setup process -- I mentioned this option for
completeness, even though I don't think it'll work)

- ip://ipaddress:port: open a socket back and read the MCA
params in
over a socket.  This could have some scalability issues...?  But
who
knows; it could be tied into the hierarchical startup such that we
wouldn't have to have an all-to-one connection scheme.
Certainly it
would cause scalability problems when paired with today's all-to-
one
RML connection scheme for the OOB.

I'm not sure that the rml: and ip: schemes are worthwhile.
Maybe a
file://stdin kind of approach could work?  Or perhaps some other
kind
of URI/IPC...?  (I really haven't thought through the issues --
this
is off the top of my head)



On Nov 8, 2007, at 2:36 PM, Ralph H Castain wrote:

Might I suggest:

https://svn.open-mpi.org/trac/ompi/ticket/1073

It deals with some of these issues and explains the boundaries of
the
problem. As for what a string param can contain, I have no
opinion.
I only
note that it must handle special characters such as ';', '/',
etc.
that are
typically found in uri's. I cannot think of any reason it should
have a
quote in it.

Ralph



On 11/8/07 12:25 PM, "Tim Prins" <tpr...@cs.indiana.edu> wrote:

The alias option you presented does not work. I think we do some
weird
things to find the absolute path for ssh, instead of just
issuing
the
command.

I would spend some time fixing this, but I don't want to do it
wrong. We
could quote all the param values, and change the parser to
remove
the
quotes, but this is assuming that a mca param does not contain
quotes.

So I guess there are 2 questions that need to be answered
before a
fix
is made:

1. What exactly can a string mca param contain? Can it have
quotes or
spaces or?

2. Which mca parameters should be forwarded? Should it be just
the
ones
from the command line? From the environment? From config files?

Tim

Ralph Castain wrote:
What changed is that we never passed mca params to the orted
before - they
always went to the app, but it's the orted that has the issue.
There is a
bug ticket thread on this subject - I forget the number
immediately.

Basically, the problem was that we cannot generally pass the
local
environment to the orteds when we launch them. However, people
needed
various mca params to get to the orteds to control their
behavior.
The only
way to resolve that problem was to pass the params via the
command
line,
which is what was done.

Except for a very few cases, all of our mca params are single
values that do
not include spaces, so this is not a problem that is causing
widespread
issues. As I said, I already had to deal with one special case
that didn't
involve spaces, but did have special characters that required
quoting, which
identified the larger problem of dealing with quoted strings.

I have no objection to a more general fix. Like I said in my
note,
though,
the general fix will take a larger effort. If someone is
willing
to do so,
that is fine with me - I was only offering solutions that would
fill the
interim time as I haven't heard anyone step up to say they
would
fix it
anytime soon.

Please feel free to jump in and volunteer! ;-) I'm willing to
put
the quotes
around things if you will fix the mca cmd line parser to
cleanly
remove them
on the other end.

Ralph



On 11/7/07 5:50 PM, "Tim Prins" <tpr...@cs.indiana.edu> wrote:

I'm curious what changed to make this a problem. How were we
passing mca
param
from the base to the app before, and why did it change?

I think that options 1 & 2 below are no good, since we, in
general, allow
string mca params to have spaces (as far as I understand
it). So
a more
general approach is needed.

Tim

On Wednesday 07 November 2007 10:40:45 am Ralph H Castain
wrote:
Sorry for delay - wasn't ignoring the issue.

There are several fixes to this problem - ranging in order
from
least to
most work:

1. just alias "ssh" to be "ssh -Y" and run without setting
the
mca param.
It won't affect anything on the backend because the daemon/
procs
don't use
ssh.

2. include "pls_rsh_agent" in the array of mca params not
to be
passed to
the orted in orte/mca/pls/base/
pls_base_general_support_fns.c,
the
orte_pls_base_orted_append_basic_args function. This would
fix
the specific
problem cited here, but I admit that listing every such
param by
name would
get tedious.

3. we could easily detect that a "problem" character was in
the
mca param
value when we add it to the orted's argv, and then put ""
around
it. The
problem, however, is that the mca param parser on the far end
doesn't
remove those "" from the resulting string. At least, I spent
over a day
fighting with a problem only to discover that was happening.
Could be an
error in the way I was doing things, or could be a real
characteristic of
the parser. Anyway, we would have to ensure that the parser
removes any
surrounding "" before passing along the param value or this
won't work.

Ralph

On 11/5/07 12:10 PM, "Tim Prins" <tpr...@cs.indiana.edu>
wrote:
Hi,

Commit 16364 broke things when using multiword mca param
values. For
instance:

mpirun --debug-daemons -mca orte_debug 1 -mca pls rsh -mca
pls_rsh_agent
"ssh -Y" xterm

Will crash and burn, because the value "ssh -Y" is being
stored
into the
argv orted_cmd_line in orterun.c:1506. This is then added to
the launch
command for the orted:

/usr/bin/ssh -Y odin004  PATH=/san/homedirs/tprins/usr/rsl/
bin:
$PATH ;
export PATH ;
LD_LIBRARY_PATH=/san/homedirs/tprins/usr/rsl/lib:
$LD_LIBRARY_PATH ;
export LD_LIBRARY_PATH ; /san/homedirs/tprins/usr/rsl/bin/
orted
--debug
--debug-daemons --name 0.1 --num_procs 2 --vpid_start 0 --
nodename
odin004 --universe tpr...@odin.cs.indiana.edu:default-
universe-27872
--nsreplica
"0.0;tcp://
129.79.240.100:40907;tcp6://2001:18e8:2:240:2e0:81ff:fe2d:
21a0
:4090 8"
--gprreplica
"0.0;tcp://
129.79.240.100:40907;tcp6://2001:18e8:2:240:2e0:81ff:fe2d:
21a0
:4090 8"
-mca orte_debug 1 -mca pls_rsh_agent ssh -Y -mca
mca_base_param_file_path
/u/tprins/usr/rsl/share/openmpi/amca-param-sets:/san/
homedirs/
tprins/rsl/
examp les
-mca mca_base_param_file_path_force /san/homedirs/tprins/
rsl/
examples

Notice that in this command we now have "-mca
pls_rsh_agent ssh
-Y". So
the quotes have been lost, as we die a horrible death.

So we need to add the quotes back in somehow, or pass these
options
differently. I'm not sure what the best way to fix this.

Thanks,

Tim





_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel





--
Jeff Squyres
Cisco Systems


Reply via email to