Victor:

It's possible it's still a quoting issue, you could try quoting like: "-n 12"

Also, you could try putting a -l or -s as appropriate in front of 
/sw/altd/bin/aprun

http://www.globus.org/toolkit/docs/5.0/5.0.2/execution/gram5/pi/#gram5-cmd-globus-job-run

If that doesn't work, you might try putting the arguments in as an RSL clause 
with
-X

Otherwise, I'm not sure.  I'm cc:ing [email protected] so that the GRAM folks 
will see it too.

Eric

----- Original Message -----
> That doesn't work. There is no such program as “/sw/altd/bin/aprun –n
> 12”. J
> 
> 
> 
> vic...@krakenpf8(XT5):~/globustests> globus-job-run
> grid.nics.utk.edu:2119/jobmanager-pbs -count 12 -m 1 -p UT-SUPPORT -d
> /lustre/scratch/victor -stdout /lustre/scratch/victor/globusjobrun.out
> -stderr /lustre/scratch/victor/globusjobrun.err "/sw/altd/bin/aprun -n
> 12" /lustre/scratch/victor/test1
> 
> GRAM Job failed because the executable does not exist (error code 5)
> 
> 
> 
> Also, why does doing the job via globusrun generate the small concise
> job and when done with globus-job-run it generates a large two part
> job that wants to ssh to a list of nodes? Where is the code that
> generates the globus-job-run job script(s) as I will need to fix that
> for sure. I see where the argument parsing is done, but what I don’t
> get is why does the “-n” get removed when used with globus-job-run but
> parses/works ok when I do globusrun? Strange.
> 
> 
> 
> -Victor
> 
> 
> 
> -----Original Message-----
> From: [email protected] [mailto:[email protected]]
> Sent: Tuesday, December 14, 2010 2:33 PM
> To: Hazlewood, Victor Gene
> Cc: JP Navarro; gig-pack
> Subject: Re: globusrun and globus-job-run difference
> 
> 
> 
> Victor,
> 
> 
> 
> My guess is that globus-job-run may be misinterpreting the "-n" as a
> option that it thinks it needs to interpret, as opposed to an argument
> to aprun. If this is the case, it is possible that some creative
> quoting may work. You could try something like:
> 
> 
> 
> globus-job-run grid.nics.utk.edu:2119/jobmanager-pbs -count 12 -m 1 -p
> UT-SUPPORT -d /lustre/scratch/victor -stdout
> /lustre/scratch/victor/globusjobrun.out -stderr
> /lustre/scratch/victor/globusjobrun.err "/sw/altd/bin/aprun -n 12"
> /lustre/scratch/victor/test1
> 
> 
> 
> 
> 
> Eric
> 
> 
> 
> ----- Original Message -----
> 
> > Eric,
> 
> >
> 
> >
> 
> >
> 
> > I am testing the different ways to invoke GRAM5 remote execution to
> 
> > make sure things are consistent and have come across the following
> 
> > problem. What do I need to do to modify the GRAM5 software so that
> 
> > globusrun and globus-job-run commands for the same end user
> 
> > programming result generates the same or, at least, nearly the same
> 
> > end user job? I am not sure why or where the globus-job-run command
> 
> > generates such a strange batch job with embedded “ssh” commands and
> 
> > then strips the arguments different than when using globusrun. Any
> 
> > pointers would be helpful.
> 
> >
> 
> >
> 
> >
> 
> > When using globusrun I get the following:
> 
> >
> 
> >
> 
> >
> 
> > Kraken$ globusrun -o -r grid.nics.utk.edu:2119/jobmanager-pbs
> 
> > '&(executable=/sw/altd/bin/aprun)(arguments="-n 12"
> 
> > "/lustre/scratch/victor/test1")(jobType=single)(count="12")(maxtime=1)(directory='/lustre/scratch/victor')(save_job_description="yes")(emailonabort=yes)(emailonexecution=yes)(emailontermination=yes)(email_address="[email protected]")(project=UT-SUPPORT)(stdout='/lustre/scratch/victor/globusjob.out')(stderr='/lustre/scratch/victor/globusjob.err')(queue=batch)'
> 
> >
> 
> >
> 
> >
> 
> > This generates and submits the following successful batch job:
> 
> >
> 
> > (though I don’t like that it automatically specifies ‘< /dev/null’
> > as
> 
> > stdin if an stdin RSL is not specified)
> 
> >
> 
> >
> 
> >
> 
> > #! /bin/sh
> 
> >
> 
> > # PBS batch job script built by Globus job manager
> 
> >
> 
> > #
> 
> >
> 
> > #PBS -S /bin/sh
> 
> >
> 
> > #PBS -M [email protected]
> 
> >
> 
> > #PBS -m abe
> 
> >
> 
> > #PBS -A UT-SUPPORT
> 
> >
> 
> > #PBS -l walltime=1:00
> 
> >
> 
> > #PBS -o /lustre/scratch/victor/globusjob.out
> 
> >
> 
> > #PBS -e /lustre/scratch/victor/globusjob.err
> 
> >
> 
> > #PBS -l size=24
> 
> >
> 
> > X509_USER_PROXY="/nics/a/home/victor/.globus/job/grid.nics.utk.edu/16073652731378661006.1367557268601599423/x509_user_proxy";
> 
> >
> 
> > export X509_USER_PROXY;
> 
> >
> 
> > GLOBUS_LOCATION="/nics/e/sw/teragrid/gram5-5.0.2-r1";
> 
> >
> 
> > export GLOBUS_LOCATION;
> 
> >
> 
> > GLOBUS_GRAM_JOB_CONTACT="https://grid.nics.utk.edu:50383/16073652731378661006/1367557268601599423/";;
> 
> >
> 
> > export GLOBUS_GRAM_JOB_CONTACT;
> 
> >
> 
> > HOME="/nics/a/home/victor";
> 
> >
> 
> > export HOME;
> 
> >
> 
> > LOGNAME="victor";
> 
> >
> 
> > export LOGNAME;
> 
> >
> 
> > GLOBUS_GASS_CACHE_DEFAULT="/nics/a/home/victor/.globus/.gass_cache";
> 
> >
> 
> > export GLOBUS_GASS_CACHE_DEFAULT;
> 
> >
> 
> >
> 
> >
> 
> > #Change to directory requested by user
> 
> >
> 
> > cd /lustre/scratch/victor
> 
> >
> 
> > /sw/altd/bin/aprun -n 12 /lustre/scratch/victor/test1 < /dev/null
> 
> >
> 
> >
> 
> >
> 
> > However, when I do essentially the exactly same job using
> 
> > globus-job-run I get the following:
> 
> >
> 
> >
> 
> >
> 
> > Kraken$ globus-job-run grid.nics.utk.edu:2119/jobmanager-pbs -count
> > 12
> 
> > -m 1 -p UT-SUPPORT -d /lustre/scratch/victor -stdout
> 
> > /lustre/scratch/victor/globusjobrun.out -stderr
> 
> > /lustre/scratch/victor/globusjobrun.err /sw/altd/bin/aprun -n 12
> 
> > /lustre/scratch/victor/test1
> 
> >
> 
> >
> 
> >
> 
> > Which generates the following totally different job with bad
> > arguments
> 
> > to the aprun command inside the scheduler_pbs_cmd_script removing
> > the
> 
> > ‘-n’ part of the argument:
> 
> >
> 
> >
> 
> >
> 
> > #! /bin/sh
> 
> >
> 
> > # PBS batch job script built by Globus job manager
> 
> >
> 
> > #
> 
> >
> 
> > #PBS -S /bin/sh
> 
> >
> 
> > #PBS -m n
> 
> >
> 
> > #PBS -A UT-SUPPORT
> 
> >
> 
> > #PBS -l walltime=1:00
> 
> >
> 
> > #PBS -o /lustre/scratch/victor/globusjobrun.out
> 
> >
> 
> > #PBS -e /lustre/scratch/victor/globusjobrun.err
> 
> >
> 
> > #PBS -l size=12
> 
> >
> 
> > X509_USER_PROXY="/nics/a/home/victor/.globus/job/grid.nics.utk.edu/16145870889033404456.1367557268601595317/x509_user_proxy";
> 
> >
> 
> > export X509_USER_PROXY;
> 
> >
> 
> > GLOBUS_LOCATION="/nics/e/sw/teragrid/gram5-5.0.2-r1";
> 
> >
> 
> > export GLOBUS_LOCATION;
> 
> >
> 
> > GLOBUS_GRAM_JOB_CONTACT="https://grid.nics.utk.edu:50383/16145870889033404456/1367557268601595317/";;
> 
> >
> 
> > export GLOBUS_GRAM_JOB_CONTACT;
> 
> >
> 
> > HOME="/nics/a/home/victor";
> 
> >
> 
> > export HOME;
> 
> >
> 
> > LOGNAME="victor";
> 
> >
> 
> > export LOGNAME;
> 
> >
> 
> > GLOBUS_GASS_CACHE_DEFAULT="/nics/a/home/victor/.globus/.gass_cache";
> 
> >
> 
> > export GLOBUS_GASS_CACHE_DEFAULT;
> 
> >
> 
> >
> 
> >
> 
> > #Change to directory requested by user
> 
> >
> 
> > cd /lustre/scratch/victor
> 
> >
> 
> >
> 
> >
> 
> > hosts=`cat $PBS_NODEFILE`;
> 
> >
> 
> > counter=0
> 
> >
> 
> > while test $counter -lt 12; do
> 
> >
> 
> > for host in $hosts; do
> 
> >
> 
> > if test $counter -lt 12; then
> 
> >
> 
> > /usr/local/openssh/bin/ssh $host "/bin/sh
> 
> > /nics/a/home/victor/.globus/job/grid.nics.utk.edu/16145870889033404456.1367557268601595317/scheduler_pbs_cmd_script;
> 
> > echo \$? >
> 
> > /nics/a/home/victor/.globus/job/grid.nics.utk.edu/16145870889033404456.1367557268601595317/exit.$counter"
> 
> > < /dev/null &
> 
> >
> 
> > counter=`expr $counter + 1`
> 
> >
> 
> > else
> 
> >
> 
> > break
> 
> >
> 
> > fi
> 
> >
> 
> > done
> 
> >
> 
> > done
> 
> >
> 
> > wait
> 
> >
> 
> >
> 
> >
> 
> > counter=0
> 
> >
> 
> > exit_code=0
> 
> >
> 
> > while test $counter -lt 12; do
> 
> >
> 
> > /bin/touch
> 
> > /nics/a/home/victor/.globus/job/grid.nics.utk.edu/16145870889033404456.1367557268601595317/exit.$counter;
> 
> >
> 
> >
> 
> >
> 
> > read tmp_exit_code <
> 
> > /nics/a/home/victor/.globus/job/grid.nics.utk.edu/16145870889033404456.1367557268601595317/exit.$counter
> 
> >
> 
> > if [ $exit_code = 0 -a $tmp_exit_code != 0 ]; then
> 
> >
> 
> > exit_code=$tmp_exit_code
> 
> >
> 
> > fi
> 
> >
> 
> > counter=`expr $counter + 1`
> 
> >
> 
> > done
> 
> >
> 
> >
> 
> >
> 
> > exit $exit_code
> 
> >
> 
> >
> 
> >
> 
> > with schedule_pbs_cmd_script being the following with INCORRECT
> 
> > arguments after the aprun:
> 
> >
> 
> >
> 
> >
> 
> > #!/bin/sh -l
> 
> >
> 
> > cd /lustre/scratch/victor
> 
> >
> 
> > X509_USER_PROXY="/nics/a/home/victor/.globus/job/grid.nics.utk.edu/16145870889033404456.1367557268601595317/x509_user_proxy";
> 
> >
> 
> > export X509_USER_PROXY;
> 
> >
> 
> > GLOBUS_LOCATION="/nics/e/sw/teragrid/gram5-5.0.2-r1";
> 
> >
> 
> > export GLOBUS_LOCATION;
> 
> >
> 
> > GLOBUS_GRAM_JOB_CONTACT="https://grid.nics.utk.edu:50383/16145870889033404456/1367557268601595317/";;
> 
> >
> 
> > export GLOBUS_GRAM_JOB_CONTACT;
> 
> >
> 
> > HOME="/nics/a/home/victor";
> 
> >
> 
> > export HOME;
> 
> >
> 
> > LOGNAME="victor";
> 
> >
> 
> > export LOGNAME;
> 
> >
> 
> > GLOBUS_GASS_CACHE_DEFAULT="/nics/a/home/victor/.globus/.gass_cache";
> 
> >
> 
> > export GLOBUS_GASS_CACHE_DEFAULT;
> 
> >
> 
> >
> 
> >
> 
> > /sw/altd/bin/aprun 12 /lustre/scratch/victor/test1 (note the –n 12
> > has
> 
> > been changed to just 12)
> 
> >
> 
> >
> 
> >
> 
> > Also when I do a globus-job-run –dumprsl I get essentially the same
> 
> > RSL as in the globusrun job above except the “-n” gets stripped off
> 
> > which is undesirable.
> 
> >
> 
> >
> 
> >
> 
> > vic...@krakenpf8(XT5):~/globustests> globus-job-run -dumprsl
> 
> > grid.nics.utk.edu:2119/jobmanager-pbs -count 12 -m 1 -p UT-SUPPORT
> > -d
> 
> > /lustre/scratch/victor -stdout
> > /lustre/scratch/victor/globusjobrun.out
> 
> > -stderr /lustre/scratch/victor/globusjobrun.err /sw/altd/bin/aprun
> > "-n
> 
> > 12" /lustre/scratch/victor/test1
> 
> >
> 
> > &(executable="/sw/altd/bin/aprun")
> 
> >
> 
> > (project="UT-SUPPORT")
> 
> >
> 
> > (maxtime=1)
> 
> >
> 
> > (directory="/lustre/scratch/victor")
> 
> >
> 
> > (count=12)
> 
> >
> 
> > (arguments= "12" "/lustre/scratch/victor/test1")
> 
> >
> 
> > (stdout="/lustre/scratch/victor/globusjobrun.out")
> 
> >
> 
> > (stderr="/lustre/scratch/victor/globusjobrun.err")

Reply via email to