George Bosilca wrote:
Until then you should be using the latest command "tv8 mpirun -a -np 2 -bynode `pwd`/NPmpi". The `pwd` is really important for some reason, otherwise TotalView is unable to find the executable. The problem is that the name of the process will be "./NPmpi" and TotalView does not have access to the path where the executable was launched (at least that's the reason I think).


Thanks George. That works except for one catch, when I'm asked on startup if I want to stop the parallel job (and hit yes), totalview waits forever trying to connect to a remote server. I see this on the xterm (shortened in a few places):

Launching TotalView Debugger Servers with command:
srun --jobid=0 -N1 -n1 -w`awk -F. 'BEGIN {ORS=","} {if (NR==1) ORS=""; print $1}' $PWD/TVT1Pa4Fjm` -l --input=none /usr/global/tools/totalview.8.1.0-1/linux-x86-64/bin/tvdsvr -callback_host atlas34 -callback_ports atlas31:16382 -set_pws 47319a24:4688a7a2 -verbosity info -working_directory $PWD/NetPIPE_3.6.2
srun: error: Invalid numeric value "0" for jobid.

I got around this by hitting cancel in the 'waiting to connect' dialog, then setting my slurm jobid manually in file -> preferences -> bulk launch -> command instead of the %J filler, and restarting. Is there a better work around for this?

Andrew

Reply via email to