On Thu, Oct/08/2009 03:18:07PM, Ashley Pittman wrote: > On Thu, 2009-10-08 at 09:51 -0400, Ethan Mallove wrote: > > > $ padb --verbose --debug=all --config-option rmgr=mpirun --full-report=6336 > > ... > > full job report for job 6336 > > > > Attaching to job 6336 > > mpirun resource manager requires pdsh to be installed > > Use of uninitialized value in printf at padb line 729. > > Use of uninitialized value in printf at padb line 729. > > DEBUG (verbose): 0: There are 0 processes over 0 hosts > > Fatal problem setting up the resource manager: mpirun > > > > I assume it's referring to the below "pdsh"? > > > > http://sourceforge.net/projects/pdsh > > Yes, you'll need to able to ssh freely around from the node where > padb/pdsh is running to all compute nodes as well. For debian I had to > add "export PDSH_RCMD_TYPE=ssh" to my .bashrc to tell it to use ssh > rather than rsh. > > Could you update to r283 as well, the "mpirun" resource manager is new > and I discovered this morning that it didn't like digits in hostnames. > As an added benefit it won't use pdsh or ssh if all processes are local.
It looks like it's using a bad option to pdsh? $ padb --debug=all --verbose --config-option rmgr=mpirun --full-report=24303 ... padb version 3.n (Revision 283) full job report for job 24303 Attaching to job 24303 Use of uninitialized value in string ne at padb line 2720. Job has 1 process(es) Job spans 0 host(s) DEBUG (verbose): 0: There are 1 processes over 0 hosts DEBUG (verbose): 0: Remote process data available on frontend DEBUG (show_cmd): 0: pdsh -w padb --inner --outer="burl-ct-v20z-0:52314" einner: pdsh: illegal option -- - einner: Usage: pdsh [-options] command ... einner: -S return largest of remote command return values einner: -h output usage menu and quit einner: -V output version information and quit einner: -q list the option settings and quit einner: -b disable ^C status feature (batch mode) einner: -d enable extra debug information from ^C status einner: -l user execute remote commands as user einner: -t seconds set connect timeout (default is 10 sec) einner: -u seconds set command timeout (no default) einner: -f n use fanout of n nodes einner: -w host,host,... set target node list on command line einner: -x host,host,... set node exclusion list on command line einner: -R name set rcmd module to name einner: -N disable hostname: labels on output lines einner: -L list info on all loaded modules and exit einner: available rcmd modules: rsh,exec (default: rsh) Unexpected EOF from Inner stdout (connecting) Unexpected EOF from Inner stderr (connecting) Unexpected exit from parallel command (state=connecting) result from parallel command is 256 (state=connecting) Bad exit code from parallel command (exit_code=1) DEBUG (verbose): 5: Completed command -Ethan > > Ashley, > > -- > Ashley Pittman, Bath, UK. > > Padb - A parallel job inspection tool for cluster computing > http://padb.pittman.org.uk > > > _______________________________________________ > mtt-devel mailing list > mtt-de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/mtt-devel