George Bosilca wrote:
Until then you should be using the latest command "tv8 mpirun -a -np
2 -bynode `pwd`/NPmpi". The `pwd` is really important for some
reason, otherwise TotalView is unable to find the executable. The
problem is that the name of the process will be "./NPmpi" and
TotalView does not have access to the path where the executable was
launched (at least that's the reason I think).
Thanks George. That works except for one catch, when I'm asked on
startup if I want to stop the parallel job (and hit yes), totalview
waits forever trying to connect to a remote server. I see this on the
xterm (shortened in a few places):
Launching TotalView Debugger Servers with command:
srun --jobid=0 -N1 -n1 -w`awk -F. 'BEGIN {ORS=","} {if (NR==1) ORS="";
print $1}' $PWD/TVT1Pa4Fjm` -l --input=none
/usr/global/tools/totalview.8.1.0-1/linux-x86-64/bin/tvdsvr
-callback_host atlas34 -callback_ports atlas31:16382 -set_pws
47319a24:4688a7a2 -verbosity info -working_directory $PWD/NetPIPE_3.6.2
srun: error: Invalid numeric value "0" for jobid.
I got around this by hitting cancel in the 'waiting to connect' dialog,
then setting my slurm jobid manually in file -> preferences -> bulk
launch -> command instead of the %J filler, and restarting. Is there a
better work around for this?
Andrew