Hmmm...okay, I get the message: if it is going to be fixed, then I have to fix it on my end. ;-) Which means it ain't happening anytime soon, so Tim P is the one left in the lurch.
Sorry Tim! Perhaps you can come up with a compromise that re-enables what you want to do. Ralph On 11/19/07 8:11 AM, "Jeff Squyres" <jsquy...@cisco.com> wrote: > On Nov 19, 2007, at 10:01 AM, Ralph H Castain wrote: > >> Unfortunately, it -is- a significant problem when passing the params >> on to >> the orteds, as Tim has eloquently pointed out in his original >> posting. I >> guess I don't see why it would be a significant problem to just have >> opal_cmd_line_parse do the "right thing" - if a string param is >> quoted, then >> just remove the quotes. > > How would opal_cmd_line_parse know when to remove quotes and when not > remove quotes? I.e., what if an MCA param (or some other value, such > as a user application argv) required quotes? This is not uncommon for > the wrapper compilers, for example (passing -D's to the C preprocessor > that take string arguments). > > AFAIK, opal_cmd_line_parse() does exactly what I intended it to. :-) > It performs no shell-like quoting because it expects to be given a > list of "final" string tokens. > >> Problem solved, and I wouldn't have to add a bunch of code to figure >> out >> when to assemble argv arrays in different ways. > > Er... I don't understand: I joined this thread because I thought there > was still something to fix...? > > I would be against changing opal_cmd_line_parse() in the manner that > you have described because it will break other things (e.g., the > wrapper compilers). > > > >> >> >> On 11/19/07 7:52 AM, "Jeff Squyres" <jsquy...@cisco.com> wrote: >> >>> I guess I don't see why this is an opal_cmd_line_parse() problem. >>> >>> If you invoke executables through system(), then a shell is used and >>> quoting is necessary to preserve the individual string tokens (i.e., >>> "my beloved string" would be passed to the application as one argv >>> token, without the quotes). >>> >>> But if you're building up an array of argv and calling some form of >>> exec(), then no shell is involved and quoting should not be >>> necessary. Specifically: >>> >>> opal_append_argv(&argc, &argv, "my beloved string"); >>> >>> will be passed as one string token to the application. >>> >>> opal_cmd_line_parse() is passed an array of argv, meaning that the >>> command line have already been split into individual string >>> tokens. I >>> guess the question is whether these come in directly from the >>> arguments to main() or whether we are getting a single string and >>> breaking it up into tokens. If the latter is true, then we need to >>> re- >>> evaluate our break-into-tokens algorithm. I have a dim recollection >>> that these come in from the arguments to main(), though. >>> >>> I guess I can see where this would get complicated for rsh/ssh >>> invocations, because *both* models are used. (i.e., you exec locally >>> but it turns into a system-like invocation on the remote side). In >>> this case, I think you'll need to quote extended strings (e.g., those >>> containing spaces) for the non-local invocations not not quote it for >>> local invocations. >>> >>> >>> >>> >>> On Nov 19, 2007, at 9:19 AM, Ralph H Castain wrote: >>> >>>> My, you are joining late! ;-) >>>> >>>> The problem is with MCA params that take string arguments. If we >>>> pass the following: >>>> >>>> -mca foo "my beloved string" >>>> >>>> on the command line of an orted, we get a value for foo that >>>> includes the quote marks. I verified this rather painfully when >>>> attempting to pass a command line mca param for a uri. I eventually >>>> had to add specific code to clean the paramater up. >>>> >>>> This appears to be somehow related to the precise method used to >>>> register the param. For example, the following deprecated method >>>> works fine: >>>> >>>> On the setup end: >>>> >>>> opal_argv_append(argc, argv, "--gprreplica"); >>>> if (NULL != orte_process_info.gpr_replica_uri) { >>>> contact_info = strdup(orte_process_info.gpr_replica_uri); >>>> } else { >>>> contact_info = orte_rml.get_contact_info(); >>>> } >>>> asprintf(¶m, "\"%s\"", contact_info); >>>> opal_argv_append(argc, argv, param); >>>> >>>> And on the receiving end: >>>> id = mca_base_param_register_string("gpr", "replica", "uri", >>>> NULL, orte_process_info.gpr_replica_uri); >>>> mca_base_param_lookup_string(id, >>>> &(orte_process_info.gpr_replica_uri)); >>>> mca_base_param_set_internal(id, true); >>>> >>>> >>>> However, the following does NOT work cleanly: >>>> >>>> On the setup end: >>>> rml_uri = orte_rml.get_contact_info(); >>>> asprintf(¶m, "\"%s\"", rml_uri); >>>> opal_argv_append(argc, argv, "--hnp-uri"); >>>> opal_argv_append(argc, argv, param); >>>> >>>> On the receiving end: >>>> mca_base_param_reg_string_name("orte", "hnp_uri", >>>> "HNP contact info", >>>> true, false, NULL, &uri); >>>> >>>> Thereby necessitating the addition of the following code to clean it >>>> up: >>>> if (NULL != uri) { >>>> /* the uri value passed to us will have quote marks around >>>> it to protect >>>> * the value if passed on the command line. We must remove >>>> those >>>> * to have a correct uri string >>>> */ >>>> if ('"' == uri[0]) { >>>> /* if the first char is a quote, then so will the last >>>> one be */ >>>> ptr = &uri[1]; >>>> len = strlen(ptr) - 1; >>>> } else { >>>> ptr = &uri[0]; >>>> len = strlen(uri); >>>> } >>>> >>>> /* we have to copy the string by hand as strndup is a GNU >>>> extension >>>> * and may not be generally available >>>> */ >>>> orte_process_info.my_hnp_uri = (char*)malloc(len+1); >>>> for (i=0; i < len; i++) { >>>> orte_process_info.my_hnp_uri[i] = ptr[i]; >>>> } >>>> orte_process_info.my_hnp_uri[len] = '\0'; /* NULL terminate >>>> */ >>>> free(uri); >>>> >>>> } >>>> >>>> It was my understanding that you wanted us to move away from the >>>> deprecated interface hence my comment that we cannot just quote >>>> all the strings as we would have to add this code all over the >>>> place, or (better) fix opal_cmd_line_parse. >>>> >>>> Hope that helps >>>> Ralph >>>> >>>> >>>> On 11/19/07 7:01 AM, "Jeff Squyres" <jsquy...@cisco.com> wrote: >>>> >>>>> Sorry -- I'm just joining this conversation late: what's the >>>>> problem >>>>> with opal_cmd_line_parse? >>>>> >>>>> It should obey all quoting from shells, etc. I.e., it shouldn't >>>> care >>>>> about tokens with special characters (to include spaces) because >>>>> the >>>>> shell divides all of that stuff up -- it just gets a char*[] that >>>>> it >>>>> treats as discrete tokens. >>>>> >>>>> Is it doing something wrong? >>>>> >>>>> >>>>> >>>>> On Nov 19, 2007, at 8:39 AM, Ralph H Castain wrote: >>>>> >>>>>> I'm not sure it is really necessary - the problem is solely within >>>>>> opal_cmd_line_parse and (if someone can parse that code ;-)) is >>>>>> truly simple >>>>>> to fix. The overly long cmd line issue is due to a bug that Josh >>>> was >>>>>> going >>>>>> to look at (may already have done so while I was out of touch). >>>>>> >>>>>> Ralph >>>>>> >>>>>> >>>>>> >>>>>> On 11/9/07 5:10 AM, "Jeff Squyres" <jsquy...@cisco.com> wrote: >>>>>> >>>>>>> Should there be another option for passing MCA parameters between >>>>>>> processes, such as via stdin (or any file descriptor)? I.e., >>>> during >>>>>>> the command line parsing to check for command line MCA params, >>>>>>> perhaps >>>>>>> a new argument could be introduced: -mcauri <uri>, where <uri> >>>> could >>>>>>> be a few different forms: >>>>>>> >>>>>>> - file://stdin: (note the 2 //, not 3, so "stdin" would never >>>>>>> conflict >>>>>>> with a real file named /stdin) Read the parameters in off stdin. >>>>>>> >>>>>>> - rml://...rml contact info...: read in the MCA params via the >>>>>>> RML >>>>>>> (although I assume that reading via the RML would be *waaaay* to >>>> late >>>>>>> during the MCA setup process -- I mentioned this option for >>>>>>> completeness, even though I don't think it'll work) >>>>>>> >>>>>>> - ip://ipaddress:port: open a socket back and read the MCA >>>> params in >>>>>>> over a socket. This could have some scalability issues...? But >>>> who >>>>>>> knows; it could be tied into the hierarchical startup such that >>>>>>> we >>>>>>> wouldn't have to have an all-to-one connection scheme. >>>> Certainly it >>>>>>> would cause scalability problems when paired with today's all-to- >>>> one >>>>>>> RML connection scheme for the OOB. >>>>>>> >>>>>>> I'm not sure that the rml: and ip: schemes are worthwhile. >>>> Maybe a >>>>>>> file://stdin kind of approach could work? Or perhaps some other >>>> kind >>>>>>> of URI/IPC...? (I really haven't thought through the issues -- >>>> this >>>>>>> is off the top of my head) >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Nov 8, 2007, at 2:36 PM, Ralph H Castain wrote: >>>>>>> >>>>>>>> Might I suggest: >>>>>>>> >>>>>>>> https://svn.open-mpi.org/trac/ompi/ticket/1073 >>>>>>>> >>>>>>>> It deals with some of these issues and explains the boundaries >>>>>>>> of >>>>>>>> the >>>>>>>> problem. As for what a string param can contain, I have no >>>> opinion. >>>>>>>> I only >>>>>>>> note that it must handle special characters such as ';', '/', >>>> etc. >>>>>>>> that are >>>>>>>> typically found in uri's. I cannot think of any reason it should >>>>>>>> have a >>>>>>>> quote in it. >>>>>>>> >>>>>>>> Ralph >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On 11/8/07 12:25 PM, "Tim Prins" <tpr...@cs.indiana.edu> wrote: >>>>>>>> >>>>>>>>> The alias option you presented does not work. I think we do >>>>>>>>> some >>>>>>>>> weird >>>>>>>>> things to find the absolute path for ssh, instead of just >>>> issuing >>>>>>>>> the >>>>>>>>> command. >>>>>>>>> >>>>>>>>> I would spend some time fixing this, but I don't want to do it >>>>>>>>> wrong. We >>>>>>>>> could quote all the param values, and change the parser to >>>> remove >>>>>>>>> the >>>>>>>>> quotes, but this is assuming that a mca param does not contain >>>>>>>>> quotes. >>>>>>>>> >>>>>>>>> So I guess there are 2 questions that need to be answered >>>> before a >>>>>>>>> fix >>>>>>>>> is made: >>>>>>>>> >>>>>>>>> 1. What exactly can a string mca param contain? Can it have >>>>>>>>> quotes or >>>>>>>>> spaces or? >>>>>>>>> >>>>>>>>> 2. Which mca parameters should be forwarded? Should it be just >>>> the >>>>>>>>> ones >>>>>>>>> from the command line? From the environment? From config files? >>>>>>>>> >>>>>>>>> Tim >>>>>>>>> >>>>>>>>> Ralph Castain wrote: >>>>>>>>>> What changed is that we never passed mca params to the orted >>>>>>>>>> before - they >>>>>>>>>> always went to the app, but it's the orted that has the issue. >>>>>>>>>> There is a >>>>>>>>>> bug ticket thread on this subject - I forget the number >>>>>>>>>> immediately. >>>>>>>>>> >>>>>>>>>> Basically, the problem was that we cannot generally pass the >>>> local >>>>>>>>>> environment to the orteds when we launch them. However, people >>>>>>>>>> needed >>>>>>>>>> various mca params to get to the orteds to control their >>>> behavior. >>>>>>>>>> The only >>>>>>>>>> way to resolve that problem was to pass the params via the >>>> command >>>>>>>>>> line, >>>>>>>>>> which is what was done. >>>>>>>>>> >>>>>>>>>> Except for a very few cases, all of our mca params are single >>>>>>>>>> values that do >>>>>>>>>> not include spaces, so this is not a problem that is causing >>>>>>>>>> widespread >>>>>>>>>> issues. As I said, I already had to deal with one special case >>>>>>>>>> that didn't >>>>>>>>>> involve spaces, but did have special characters that required >>>>>>>>>> quoting, which >>>>>>>>>> identified the larger problem of dealing with quoted strings. >>>>>>>>>> >>>>>>>>>> I have no objection to a more general fix. Like I said in my >>>> note, >>>>>>>>>> though, >>>>>>>>>> the general fix will take a larger effort. If someone is >>>> willing >>>>>>>>>> to do so, >>>>>>>>>> that is fine with me - I was only offering solutions that >>>>>>>>>> would >>>>>>>>>> fill the >>>>>>>>>> interim time as I haven't heard anyone step up to say they >>>> would >>>>>>>>>> fix it >>>>>>>>>> anytime soon. >>>>>>>>>> >>>>>>>>>> Please feel free to jump in and volunteer! ;-) I'm willing to >>>> put >>>>>>>>>> the quotes >>>>>>>>>> around things if you will fix the mca cmd line parser to >>>> cleanly >>>>>>>>>> remove them >>>>>>>>>> on the other end. >>>>>>>>>> >>>>>>>>>> Ralph >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 11/7/07 5:50 PM, "Tim Prins" <tpr...@cs.indiana.edu> wrote: >>>>>>>>>> >>>>>>>>>>> I'm curious what changed to make this a problem. How were we >>>>>>>>>>> passing mca >>>>>>>>>>> param >>>>>>>>>>> from the base to the app before, and why did it change? >>>>>>>>>>> >>>>>>>>>>> I think that options 1 & 2 below are no good, since we, in >>>>>>>>>>> general, allow >>>>>>>>>>> string mca params to have spaces (as far as I understand >>>> it). So >>>>>>>>>>> a more >>>>>>>>>>> general approach is needed. >>>>>>>>>>> >>>>>>>>>>> Tim >>>>>>>>>>> >>>>>>>>>>> On Wednesday 07 November 2007 10:40:45 am Ralph H Castain >>>> wrote: >>>>>>>>>>>> Sorry for delay - wasn't ignoring the issue. >>>>>>>>>>>> >>>>>>>>>>>> There are several fixes to this problem - ranging in order >>>> from >>>>>>>>>>>> least to >>>>>>>>>>>> most work: >>>>>>>>>>>> >>>>>>>>>>>> 1. just alias "ssh" to be "ssh -Y" and run without setting >>>> the >>>>>>>>>>>> mca param. >>>>>>>>>>>> It won't affect anything on the backend because the daemon/ >>>> procs >>>>>>>>>>>> don't use >>>>>>>>>>>> ssh. >>>>>>>>>>>> >>>>>>>>>>>> 2. include "pls_rsh_agent" in the array of mca params not >>>> to be >>>>>>>>>>>> passed to >>>>>>>>>>>> the orted in orte/mca/pls/base/ >>>> pls_base_general_support_fns.c, >>>>>>>>>>>> the >>>>>>>>>>>> orte_pls_base_orted_append_basic_args function. This would >>>> fix >>>>>>>>>>>> the specific >>>>>>>>>>>> problem cited here, but I admit that listing every such >>>> param by >>>>>>>>>>>> name would >>>>>>>>>>>> get tedious. >>>>>>>>>>>> >>>>>>>>>>>> 3. we could easily detect that a "problem" character was in >>>> the >>>>>>>>>>>> mca param >>>>>>>>>>>> value when we add it to the orted's argv, and then put "" >>>> around >>>>>>>>>>>> it. The >>>>>>>>>>>> problem, however, is that the mca param parser on the far >>>>>>>>>>>> end >>>>>>>>>>>> doesn't >>>>>>>>>>>> remove those "" from the resulting string. At least, I spent >>>>>>>>>>>> over a day >>>>>>>>>>>> fighting with a problem only to discover that was happening. >>>>>>>>>>>> Could be an >>>>>>>>>>>> error in the way I was doing things, or could be a real >>>>>>>>>>>> characteristic of >>>>>>>>>>>> the parser. Anyway, we would have to ensure that the parser >>>>>>>>>>>> removes any >>>>>>>>>>>> surrounding "" before passing along the param value or this >>>>>>>>>>>> won't work. >>>>>>>>>>>> >>>>>>>>>>>> Ralph >>>>>>>>>>>> >>>>>>>>>>>> On 11/5/07 12:10 PM, "Tim Prins" <tpr...@cs.indiana.edu> >>>> wrote: >>>>>>>>>>>> Hi, >>>>>>>>>>>> >>>>>>>>>>>> Commit 16364 broke things when using multiword mca param >>>>>>>>>>>> values. For >>>>>>>>>>>> instance: >>>>>>>>>>>> >>>>>>>>>>>> mpirun --debug-daemons -mca orte_debug 1 -mca pls rsh -mca >>>>>>>>>>>> pls_rsh_agent >>>>>>>>>>>> "ssh -Y" xterm >>>>>>>>>>>> >>>>>>>>>>>> Will crash and burn, because the value "ssh -Y" is being >>>> stored >>>>>>>>>>>> into the >>>>>>>>>>>> argv orted_cmd_line in orterun.c:1506. This is then added >>>>>>>>>>>> to >>>>>>>>>>>> the launch >>>>>>>>>>>> command for the orted: >>>>>>>>>>>> >>>>>>>>>>>> /usr/bin/ssh -Y odin004 PATH=/san/homedirs/tprins/usr/rsl/ >>>> bin: >>>>>>>>>>>> $PATH ; >>>>>>>>>>>> export PATH ; >>>>>>>>>>>> LD_LIBRARY_PATH=/san/homedirs/tprins/usr/rsl/lib: >>>>>>>>>>>> $LD_LIBRARY_PATH ; >>>>>>>>>>>> export LD_LIBRARY_PATH ; /san/homedirs/tprins/usr/rsl/bin/ >>>> orted >>>>>>>>>>>> --debug >>>>>>>>>>>> --debug-daemons --name 0.1 --num_procs 2 --vpid_start 0 -- >>>>>>>>>>>> nodename >>>>>>>>>>>> odin004 --universe tpr...@odin.cs.indiana.edu:default- >>>>>>>>>>>> universe-27872 >>>>>>>>>>>> --nsreplica >>>>>>>>>>>> "0.0;tcp:// >>>>>>>>>>>> 129.79.240.100:40907;tcp6://2001:18e8:2:240:2e0:81ff:fe2d: >>>> 21a0 >>>>>>>>>>>> :4090 8" >>>>>>>>>>>> --gprreplica >>>>>>>>>>>> "0.0;tcp:// >>>>>>>>>>>> 129.79.240.100:40907;tcp6://2001:18e8:2:240:2e0:81ff:fe2d: >>>> 21a0 >>>>>>>>>>>> :4090 8" >>>>>>>>>>>> -mca orte_debug 1 -mca pls_rsh_agent ssh -Y -mca >>>>>>>>>>>> mca_base_param_file_path >>>>>>>>>>>> /u/tprins/usr/rsl/share/openmpi/amca-param-sets:/san/ >>>> homedirs/ >>>>>>>>>>>> tprins/rsl/ >>>>>>>>>>>> examp les >>>>>>>>>>>> -mca mca_base_param_file_path_force /san/homedirs/tprins/ >>>> rsl/ >>>>>>>>>>>> examples >>>>>>>>>>>> >>>>>>>>>>>> Notice that in this command we now have "-mca >>>> pls_rsh_agent ssh >>>>>>>>>>>> -Y". So >>>>>>>>>>>> the quotes have been lost, as we die a horrible death. >>>>>>>>>>>> >>>>>>>>>>>> So we need to add the quotes back in somehow, or pass these >>>>>>>>>>>> options >>>>>>>>>>>> differently. I'm not sure what the best way to fix this. >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> >>>>>>>>>>>> Tim >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> devel mailing list >>>>>>>> de...@open-mpi.org >>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>>>> >>>>> >>> >