I'm curious what changed to make this a problem. How were we passing mca param from the base to the app before, and why did it change?
I think that options 1 & 2 below are no good, since we, in general, allow string mca params to have spaces (as far as I understand it). So a more general approach is needed. Tim On Wednesday 07 November 2007 10:40:45 am Ralph H Castain wrote: > Sorry for delay - wasn't ignoring the issue. > > There are several fixes to this problem - ranging in order from least to > most work: > > 1. just alias "ssh" to be "ssh -Y" and run without setting the mca param. > It won't affect anything on the backend because the daemon/procs don't use > ssh. > > 2. include "pls_rsh_agent" in the array of mca params not to be passed to > the orted in orte/mca/pls/base/pls_base_general_support_fns.c, the > orte_pls_base_orted_append_basic_args function. This would fix the specific > problem cited here, but I admit that listing every such param by name would > get tedious. > > 3. we could easily detect that a "problem" character was in the mca param > value when we add it to the orted's argv, and then put "" around it. The > problem, however, is that the mca param parser on the far end doesn't > remove those "" from the resulting string. At least, I spent over a day > fighting with a problem only to discover that was happening. Could be an > error in the way I was doing things, or could be a real characteristic of > the parser. Anyway, we would have to ensure that the parser removes any > surrounding "" before passing along the param value or this won't work. > > Ralph > > On 11/5/07 12:10 PM, "Tim Prins" <tpr...@cs.indiana.edu> wrote: > > Hi, > > > > Commit 16364 broke things when using multiword mca param values. For > > instance: > > > > mpirun --debug-daemons -mca orte_debug 1 -mca pls rsh -mca pls_rsh_agent > > "ssh -Y" xterm > > > > Will crash and burn, because the value "ssh -Y" is being stored into the > > argv orted_cmd_line in orterun.c:1506. This is then added to the launch > > command for the orted: > > > > /usr/bin/ssh -Y odin004 PATH=/san/homedirs/tprins/usr/rsl/bin:$PATH ; > > export PATH ; > > LD_LIBRARY_PATH=/san/homedirs/tprins/usr/rsl/lib:$LD_LIBRARY_PATH ; > > export LD_LIBRARY_PATH ; /san/homedirs/tprins/usr/rsl/bin/orted --debug > > --debug-daemons --name 0.1 --num_procs 2 --vpid_start 0 --nodename > > odin004 --universe tpr...@odin.cs.indiana.edu:default-universe-27872 > > --nsreplica > > "0.0;tcp://129.79.240.100:40907;tcp6://2001:18e8:2:240:2e0:81ff:fe2d:21a0 > >:4090 8" > > --gprreplica > > "0.0;tcp://129.79.240.100:40907;tcp6://2001:18e8:2:240:2e0:81ff:fe2d:21a0 > >:4090 8" > > -mca orte_debug 1 -mca pls_rsh_agent ssh -Y -mca > > mca_base_param_file_path > > /u/tprins/usr/rsl/share/openmpi/amca-param-sets:/san/homedirs/tprins/rsl/ > >examp les > > -mca mca_base_param_file_path_force /san/homedirs/tprins/rsl/examples > > > > Notice that in this command we now have "-mca pls_rsh_agent ssh -Y". So > > the quotes have been lost, as we die a horrible death. > > > > So we need to add the quotes back in somehow, or pass these options > > differently. I'm not sure what the best way to fix this. > > > > Thanks, > > > > Tim