Hmmm...okay, I get the message: if it is going to be fixed, then I have to
fix it on my end. ;-) Which means it ain't happening anytime soon, so Tim P
is the one left in the lurch.

Sorry Tim! Perhaps you can come up with a compromise that re-enables what
you want to do.

Ralph



On 11/19/07 8:11 AM, "Jeff Squyres" <jsquy...@cisco.com> wrote:

> On Nov 19, 2007, at 10:01 AM, Ralph H Castain wrote:
> 
>> Unfortunately, it -is- a significant problem when passing the params
>> on to
>> the orteds, as Tim has eloquently pointed out in his original
>> posting. I
>> guess I don't see why it would be a significant problem to just have
>> opal_cmd_line_parse do the "right thing" - if a string param is
>> quoted, then
>> just remove the quotes.
> 
> How would opal_cmd_line_parse know when to remove quotes and when not
> remove quotes?  I.e., what if an MCA param (or some other value, such
> as a user application argv) required quotes?  This is not uncommon for
> the wrapper compilers, for example (passing -D's to the C preprocessor
> that take string arguments).
> 
> AFAIK, opal_cmd_line_parse() does exactly what I intended it to.  :-)
> It performs no shell-like quoting because it expects to be given a
> list of "final" string tokens.
> 
>> Problem solved, and I wouldn't have to add a bunch of code to figure
>> out
>> when to assemble argv arrays in different ways.
> 
> Er... I don't understand: I joined this thread because I thought there
> was still something to fix...?
> 
> I would be against changing opal_cmd_line_parse() in the manner that
> you have described because it will break other things (e.g., the
> wrapper compilers).
> 
> 
> 
>> 
>> 
>> On 11/19/07 7:52 AM, "Jeff Squyres" <jsquy...@cisco.com> wrote:
>> 
>>> I guess I don't see why this is an opal_cmd_line_parse() problem.
>>> 
>>> If you invoke executables through system(), then a shell is used and
>>> quoting is necessary to preserve the individual string tokens (i.e.,
>>> "my beloved string" would be passed to the application as one argv
>>> token, without the quotes).
>>> 
>>> But if you're building up an array of argv and calling some form of
>>> exec(), then no shell is involved and quoting should not be
>>> necessary.  Specifically:
>>> 
>>>     opal_append_argv(&argc, &argv, "my beloved string");
>>> 
>>> will be passed as one string token to the application.
>>> 
>>> opal_cmd_line_parse() is passed an array of argv, meaning that the
>>> command line have already been split into individual string
>>> tokens.  I
>>> guess the question is whether these come in directly from the
>>> arguments to main() or whether we are getting a single string and
>>> breaking it up into tokens.  If the latter is true, then we need to
>>> re-
>>> evaluate our break-into-tokens algorithm.  I have a dim recollection
>>> that these come in from the arguments to main(), though.
>>> 
>>> I guess I can see where this would get complicated for rsh/ssh
>>> invocations, because *both* models are used. (i.e., you exec locally
>>> but it turns into a system-like invocation on the remote side).  In
>>> this case, I think you'll need to quote extended strings (e.g., those
>>> containing spaces) for the non-local invocations not not quote it for
>>> local invocations.
>>> 
>>> 
>>> 
>>> 
>>> On Nov 19, 2007, at 9:19 AM, Ralph H Castain wrote:
>>> 
>>>> My, you are joining late! ;-)
>>>> 
>>>> The problem is with MCA params that take string arguments. If we
>>>> pass the following:
>>>> 
>>>> -mca foo "my beloved string"
>>>> 
>>>> on the command line of an orted, we get a value for foo that
>>>> includes the quote marks. I verified this rather painfully when
>>>> attempting to pass a command line mca param for a uri. I eventually
>>>> had to add specific code to clean the paramater up.
>>>> 
>>>> This appears to be somehow related to the precise method used to
>>>> register the param. For example, the following deprecated method
>>>> works fine:
>>>> 
>>>> On the setup end:
>>>> 
>>>>    opal_argv_append(argc, argv, "--gprreplica");
>>>>    if (NULL != orte_process_info.gpr_replica_uri) {
>>>>        contact_info = strdup(orte_process_info.gpr_replica_uri);
>>>>    } else {
>>>>        contact_info = orte_rml.get_contact_info();
>>>>    }
>>>>    asprintf(&param, "\"%s\"", contact_info);
>>>>    opal_argv_append(argc, argv, param);
>>>> 
>>>> And on the receiving end:
>>>>    id = mca_base_param_register_string("gpr", "replica", "uri",
>>>> NULL, orte_process_info.gpr_replica_uri);
>>>>    mca_base_param_lookup_string(id,
>>>> &(orte_process_info.gpr_replica_uri));
>>>>    mca_base_param_set_internal(id, true);
>>>> 
>>>> 
>>>> However, the following does NOT work cleanly:
>>>> 
>>>> On the setup end:
>>>>   rml_uri = orte_rml.get_contact_info();
>>>>    asprintf(&param, "\"%s\"", rml_uri);
>>>>    opal_argv_append(argc, argv, "--hnp-uri");
>>>>    opal_argv_append(argc, argv, param);
>>>> 
>>>> On the receiving end:
>>>>    mca_base_param_reg_string_name("orte", "hnp_uri",
>>>>                                   "HNP contact info",
>>>>                                   true, false, NULL,  &uri);
>>>> 
>>>> Thereby necessitating the addition of the following code to clean it
>>>> up:
>>>>    if (NULL != uri) {
>>>>        /* the uri value passed to us will have quote marks around
>>>> it to protect
>>>>        * the value if passed on the command line. We must remove
>>>> those
>>>>        * to have a correct uri string
>>>>        */
>>>>       if ('"' == uri[0]) {
>>>>            /* if the first char is a quote, then so will the last
>>>> one be */
>>>>           ptr = &uri[1];
>>>>            len = strlen(ptr) - 1;
>>>>        } else {
>>>>            ptr = &uri[0];
>>>>            len = strlen(uri);
>>>>        }
>>>> 
>>>>        /* we have to copy the string by hand as strndup is a GNU
>>>> extension
>>>>         * and may not be generally available
>>>>         */
>>>>       orte_process_info.my_hnp_uri = (char*)malloc(len+1);
>>>>        for (i=0; i < len; i++) {
>>>>            orte_process_info.my_hnp_uri[i] = ptr[i];
>>>>        }
>>>>        orte_process_info.my_hnp_uri[len] = '\0';  /* NULL terminate
>>>> */
>>>>       free(uri);
>>>> 
>>>>    }
>>>> 
>>>> It was my understanding that you wanted us to move away from the
>>>> deprecated interface ­ hence my comment that we cannot just quote
>>>> all the strings as we would have to add this code all over the
>>>> place, or (better) fix opal_cmd_line_parse.
>>>> 
>>>> Hope that helps
>>>> Ralph
>>>> 
>>>> 
>>>> On 11/19/07 7:01 AM, "Jeff Squyres" <jsquy...@cisco.com> wrote:
>>>> 
>>>>> Sorry -- I'm just joining this conversation late: what's the
>>>>> problem
>>>>> with opal_cmd_line_parse?
>>>>> 
>>>>> It should obey all quoting from shells, etc.  I.e., it shouldn't
>>>> care
>>>>> about tokens with special characters (to include spaces) because
>>>>> the
>>>>> shell divides all of that stuff up -- it just gets a char*[] that
>>>>> it
>>>>> treats as discrete tokens.
>>>>> 
>>>>> Is it doing something wrong?
>>>>> 
>>>>> 
>>>>> 
>>>>> On Nov 19, 2007, at 8:39 AM, Ralph H Castain wrote:
>>>>> 
>>>>>> I'm not sure it is really necessary - the problem is solely within
>>>>>> opal_cmd_line_parse and (if someone can parse that code ;-)) is
>>>>>> truly simple
>>>>>> to fix. The overly long cmd line issue is due to a bug that Josh
>>>> was
>>>>>> going
>>>>>> to look at (may already have done so while I was out of touch).
>>>>>> 
>>>>>> Ralph
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On 11/9/07 5:10 AM, "Jeff Squyres" <jsquy...@cisco.com> wrote:
>>>>>> 
>>>>>>> Should there be another option for passing MCA parameters between
>>>>>>> processes, such as via stdin (or any file descriptor)?  I.e.,
>>>> during
>>>>>>> the command line parsing to check for command line MCA params,
>>>>>>> perhaps
>>>>>>> a new argument could be introduced: -mcauri <uri>, where <uri>
>>>> could
>>>>>>> be a few different forms:
>>>>>>> 
>>>>>>> - file://stdin: (note the 2 //, not 3, so "stdin" would never
>>>>>>> conflict
>>>>>>> with a real file named /stdin)  Read the parameters in off stdin.
>>>>>>> 
>>>>>>> - rml://...rml contact info...: read in the MCA params via the
>>>>>>> RML
>>>>>>> (although I assume that reading via the RML would be *waaaay* to
>>>> late
>>>>>>> during the MCA setup process -- I mentioned this option for
>>>>>>> completeness, even though I don't think it'll work)
>>>>>>> 
>>>>>>> - ip://ipaddress:port: open a socket back and read the MCA
>>>> params in
>>>>>>> over a socket.  This could have some scalability issues...?  But
>>>> who
>>>>>>> knows; it could be tied into the hierarchical startup such that
>>>>>>> we
>>>>>>> wouldn't have to have an all-to-one connection scheme.
>>>> Certainly it
>>>>>>> would cause scalability problems when paired with today's all-to-
>>>> one
>>>>>>> RML connection scheme for the OOB.
>>>>>>> 
>>>>>>> I'm not sure that the rml: and ip: schemes are worthwhile.
>>>> Maybe a
>>>>>>> file://stdin kind of approach could work?  Or perhaps some other
>>>> kind
>>>>>>> of URI/IPC...?  (I really haven't thought through the issues --
>>>> this
>>>>>>> is off the top of my head)
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On Nov 8, 2007, at 2:36 PM, Ralph H Castain wrote:
>>>>>>> 
>>>>>>>> Might I suggest:
>>>>>>>> 
>>>>>>>> https://svn.open-mpi.org/trac/ompi/ticket/1073
>>>>>>>> 
>>>>>>>> It deals with some of these issues and explains the boundaries
>>>>>>>> of
>>>>>>>> the
>>>>>>>> problem. As for what a string param can contain, I have no
>>>> opinion.
>>>>>>>> I only
>>>>>>>> note that it must handle special characters such as ';', '/',
>>>> etc.
>>>>>>>> that are
>>>>>>>> typically found in uri's. I cannot think of any reason it should
>>>>>>>> have a
>>>>>>>> quote in it.
>>>>>>>> 
>>>>>>>> Ralph
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On 11/8/07 12:25 PM, "Tim Prins" <tpr...@cs.indiana.edu> wrote:
>>>>>>>> 
>>>>>>>>> The alias option you presented does not work. I think we do
>>>>>>>>> some
>>>>>>>>> weird
>>>>>>>>> things to find the absolute path for ssh, instead of just
>>>> issuing
>>>>>>>>> the
>>>>>>>>> command.
>>>>>>>>> 
>>>>>>>>> I would spend some time fixing this, but I don't want to do it
>>>>>>>>> wrong. We
>>>>>>>>> could quote all the param values, and change the parser to
>>>> remove
>>>>>>>>> the
>>>>>>>>> quotes, but this is assuming that a mca param does not contain
>>>>>>>>> quotes.
>>>>>>>>> 
>>>>>>>>> So I guess there are 2 questions that need to be answered
>>>> before a
>>>>>>>>> fix
>>>>>>>>> is made:
>>>>>>>>> 
>>>>>>>>> 1. What exactly can a string mca param contain? Can it have
>>>>>>>>> quotes or
>>>>>>>>> spaces or?
>>>>>>>>> 
>>>>>>>>> 2. Which mca parameters should be forwarded? Should it be just
>>>> the
>>>>>>>>> ones
>>>>>>>>> from the command line? From the environment? From config files?
>>>>>>>>> 
>>>>>>>>> Tim
>>>>>>>>> 
>>>>>>>>> Ralph Castain wrote:
>>>>>>>>>> What changed is that we never passed mca params to the orted
>>>>>>>>>> before - they
>>>>>>>>>> always went to the app, but it's the orted that has the issue.
>>>>>>>>>> There is a
>>>>>>>>>> bug ticket thread on this subject - I forget the number
>>>>>>>>>> immediately.
>>>>>>>>>> 
>>>>>>>>>> Basically, the problem was that we cannot generally pass the
>>>> local
>>>>>>>>>> environment to the orteds when we launch them. However, people
>>>>>>>>>> needed
>>>>>>>>>> various mca params to get to the orteds to control their
>>>> behavior.
>>>>>>>>>> The only
>>>>>>>>>> way to resolve that problem was to pass the params via the
>>>> command
>>>>>>>>>> line,
>>>>>>>>>> which is what was done.
>>>>>>>>>> 
>>>>>>>>>> Except for a very few cases, all of our mca params are single
>>>>>>>>>> values that do
>>>>>>>>>> not include spaces, so this is not a problem that is causing
>>>>>>>>>> widespread
>>>>>>>>>> issues. As I said, I already had to deal with one special case
>>>>>>>>>> that didn't
>>>>>>>>>> involve spaces, but did have special characters that required
>>>>>>>>>> quoting, which
>>>>>>>>>> identified the larger problem of dealing with quoted strings.
>>>>>>>>>> 
>>>>>>>>>> I have no objection to a more general fix. Like I said in my
>>>> note,
>>>>>>>>>> though,
>>>>>>>>>> the general fix will take a larger effort. If someone is
>>>> willing
>>>>>>>>>> to do so,
>>>>>>>>>> that is fine with me - I was only offering solutions that
>>>>>>>>>> would
>>>>>>>>>> fill the
>>>>>>>>>> interim time as I haven't heard anyone step up to say they
>>>> would
>>>>>>>>>> fix it
>>>>>>>>>> anytime soon.
>>>>>>>>>> 
>>>>>>>>>> Please feel free to jump in and volunteer! ;-) I'm willing to
>>>> put
>>>>>>>>>> the quotes
>>>>>>>>>> around things if you will fix the mca cmd line parser to
>>>> cleanly
>>>>>>>>>> remove them
>>>>>>>>>> on the other end.
>>>>>>>>>> 
>>>>>>>>>> Ralph
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On 11/7/07 5:50 PM, "Tim Prins" <tpr...@cs.indiana.edu> wrote:
>>>>>>>>>> 
>>>>>>>>>>> I'm curious what changed to make this a problem. How were we
>>>>>>>>>>> passing mca
>>>>>>>>>>> param
>>>>>>>>>>> from the base to the app before, and why did it change?
>>>>>>>>>>> 
>>>>>>>>>>> I think that options 1 & 2 below are no good, since we, in
>>>>>>>>>>> general, allow
>>>>>>>>>>> string mca params to have spaces (as far as I understand
>>>> it). So
>>>>>>>>>>> a more
>>>>>>>>>>> general approach is needed.
>>>>>>>>>>> 
>>>>>>>>>>> Tim
>>>>>>>>>>> 
>>>>>>>>>>> On Wednesday 07 November 2007 10:40:45 am Ralph H Castain
>>>> wrote:
>>>>>>>>>>>> Sorry for delay - wasn't ignoring the issue.
>>>>>>>>>>>> 
>>>>>>>>>>>> There are several fixes to this problem - ranging in order
>>>> from
>>>>>>>>>>>> least to
>>>>>>>>>>>> most work:
>>>>>>>>>>>> 
>>>>>>>>>>>> 1. just alias "ssh" to be "ssh -Y" and run without setting
>>>> the
>>>>>>>>>>>> mca param.
>>>>>>>>>>>> It won't affect anything on the backend because the daemon/
>>>> procs
>>>>>>>>>>>> don't use
>>>>>>>>>>>> ssh.
>>>>>>>>>>>> 
>>>>>>>>>>>> 2. include "pls_rsh_agent" in the array of mca params not
>>>> to be
>>>>>>>>>>>> passed to
>>>>>>>>>>>> the orted in orte/mca/pls/base/
>>>> pls_base_general_support_fns.c,
>>>>>>>>>>>> the
>>>>>>>>>>>> orte_pls_base_orted_append_basic_args function. This would
>>>> fix
>>>>>>>>>>>> the specific
>>>>>>>>>>>> problem cited here, but I admit that listing every such
>>>> param by
>>>>>>>>>>>> name would
>>>>>>>>>>>> get tedious.
>>>>>>>>>>>> 
>>>>>>>>>>>> 3. we could easily detect that a "problem" character was in
>>>> the
>>>>>>>>>>>> mca param
>>>>>>>>>>>> value when we add it to the orted's argv, and then put ""
>>>> around
>>>>>>>>>>>> it. The
>>>>>>>>>>>> problem, however, is that the mca param parser on the far
>>>>>>>>>>>> end
>>>>>>>>>>>> doesn't
>>>>>>>>>>>> remove those "" from the resulting string. At least, I spent
>>>>>>>>>>>> over a day
>>>>>>>>>>>> fighting with a problem only to discover that was happening.
>>>>>>>>>>>> Could be an
>>>>>>>>>>>> error in the way I was doing things, or could be a real
>>>>>>>>>>>> characteristic of
>>>>>>>>>>>> the parser. Anyway, we would have to ensure that the parser
>>>>>>>>>>>> removes any
>>>>>>>>>>>> surrounding "" before passing along the param value or this
>>>>>>>>>>>> won't work.
>>>>>>>>>>>> 
>>>>>>>>>>>> Ralph
>>>>>>>>>>>> 
>>>>>>>>>>>> On 11/5/07 12:10 PM, "Tim Prins" <tpr...@cs.indiana.edu>
>>>> wrote:
>>>>>>>>>>>> Hi,
>>>>>>>>>>>> 
>>>>>>>>>>>> Commit 16364 broke things when using multiword mca param
>>>>>>>>>>>> values. For
>>>>>>>>>>>> instance:
>>>>>>>>>>>> 
>>>>>>>>>>>> mpirun --debug-daemons -mca orte_debug 1 -mca pls rsh -mca
>>>>>>>>>>>> pls_rsh_agent
>>>>>>>>>>>> "ssh -Y" xterm
>>>>>>>>>>>> 
>>>>>>>>>>>> Will crash and burn, because the value "ssh -Y" is being
>>>> stored
>>>>>>>>>>>> into the
>>>>>>>>>>>> argv orted_cmd_line in orterun.c:1506. This is then added
>>>>>>>>>>>> to
>>>>>>>>>>>> the launch
>>>>>>>>>>>> command for the orted:
>>>>>>>>>>>> 
>>>>>>>>>>>> /usr/bin/ssh -Y odin004  PATH=/san/homedirs/tprins/usr/rsl/
>>>> bin:
>>>>>>>>>>>> $PATH ;
>>>>>>>>>>>> export PATH ;
>>>>>>>>>>>> LD_LIBRARY_PATH=/san/homedirs/tprins/usr/rsl/lib:
>>>>>>>>>>>> $LD_LIBRARY_PATH ;
>>>>>>>>>>>> export LD_LIBRARY_PATH ; /san/homedirs/tprins/usr/rsl/bin/
>>>> orted
>>>>>>>>>>>> --debug
>>>>>>>>>>>> --debug-daemons --name 0.1 --num_procs 2 --vpid_start 0 --
>>>>>>>>>>>> nodename
>>>>>>>>>>>> odin004 --universe tpr...@odin.cs.indiana.edu:default-
>>>>>>>>>>>> universe-27872
>>>>>>>>>>>> --nsreplica
>>>>>>>>>>>> "0.0;tcp://
>>>>>>>>>>>> 129.79.240.100:40907;tcp6://2001:18e8:2:240:2e0:81ff:fe2d:
>>>> 21a0
>>>>>>>>>>>> :4090 8"
>>>>>>>>>>>> --gprreplica
>>>>>>>>>>>> "0.0;tcp://
>>>>>>>>>>>> 129.79.240.100:40907;tcp6://2001:18e8:2:240:2e0:81ff:fe2d:
>>>> 21a0
>>>>>>>>>>>> :4090 8"
>>>>>>>>>>>> -mca orte_debug 1 -mca pls_rsh_agent ssh -Y -mca
>>>>>>>>>>>> mca_base_param_file_path
>>>>>>>>>>>> /u/tprins/usr/rsl/share/openmpi/amca-param-sets:/san/
>>>> homedirs/
>>>>>>>>>>>> tprins/rsl/
>>>>>>>>>>>> examp les
>>>>>>>>>>>> -mca mca_base_param_file_path_force /san/homedirs/tprins/
>>>> rsl/
>>>>>>>>>>>> examples
>>>>>>>>>>>> 
>>>>>>>>>>>> Notice that in this command we now have "-mca
>>>> pls_rsh_agent ssh
>>>>>>>>>>>> -Y". So
>>>>>>>>>>>> the quotes have been lost, as we die a horrible death.
>>>>>>>>>>>> 
>>>>>>>>>>>> So we need to add the quotes back in somehow, or pass these
>>>>>>>>>>>> options
>>>>>>>>>>>> differently. I'm not sure what the best way to fix this.
>>>>>>>>>>>> 
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> 
>>>>>>>>>>>> Tim
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> _______________________________________________
>>>>>>>> devel mailing list
>>>>>>>> de...@open-mpi.org
>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>> 
>>>>> 
>>> 
> 



Reply via email to