Hi,

I had setup GRAM5 with PBS on a system here - and everything seemed
working all fine.

... until I tried specifying a queue name in the RSL.  Not specifying
it, or specifying the single queue the system was setup with, job
submission works all fine:

> globusrun -o -r ng1.canterbury.ac.nz 
> '&(executable=/bin/hostname)(queue=small)'
> ngcompute.canterbury.ac.nz

But when I pass any other queue name, it fails:

> globusrun -o -r ng1 '&(executable=/bin/hostname)(queue=gt5test)
> 'GRAM Job submission failed because the provided RSL 'queue' parameter is 
> invalid (error code 37)

The queue does exist and I can submit jobs to that queue as the local user.

>From what I could trace, the LRM interface script pbs.pm does NOT get
invoked at all - somehow, the job manager decides the queue name
specified is invalid.

I'm attaching below the output I got in ~/gram_<date>.log.

I'm running the GT 5.0.0 release, on Linux CentOS 5.4 x86_64.

Any help would be highly appreciated.

Cheers,
Vladimir


> ts=2010-03-23T03:31:02.009238Z id=23746 
> event=gram.register_proxy_timeout.start level=TRACE 
> ts=2010-03-23T03:31:02.009735Z id=23746 event=gram.register_proxy_timeout.end 
> level=TRACE status=0 lifetime=38955 timeout=600 
> ts=2010-03-23T03:31:02.009778Z id=23746 event=gram.startup_socket_init.start 
> level=DEBUG
> ts=2010-03-23T03:31:02.009790Z id=23746 
> event=gram.startup_socket_init.lock.start level=TRACE 
> path="/home/grid-bestgrid/.globus/job/ng1.canterbury.ac.nz/pbs.untagged.lock"
> ts=2010-03-23T03:31:02.011370Z id=23746 
> event=gram.startup_socket_init.lock.end level=TRACE 
> path="/home/grid-bestgrid/.globus/job/ng1.canterbury.ac.nz/pbs.untagged.lock" 
> status=0 
> ts=2010-03-23T03:31:02.011397Z id=23746 
> event=gram.startup_socket_init.write_pid.start level=TRACE 
> path="/home/grid-bestgrid/.globus/job/ng1.canterbury.ac.nz/pbs.untagged.pid" 
> ts=2010-03-23T03:31:02.011976Z id=23746 
> event=gram.startup_socket_init.write_pid.end level=TRACE 
> path="/home/grid-bestgrid/.globus/job/ng1.canterbury.ac.nz/pbs.untagged.pid" 
> status=0 
> ts=2010-03-23T03:31:02.011989Z id=23746 
> event=gram.startup_socket_init.create_socket.start level=TRACE 
> path="/home/grid-bestgrid/.globus/job/ng1.canterbury.ac.nz/pbs.untagged.sock" 
> ts=2010-03-23T03:31:02.012446Z id=23746 
> event=gram.startup.socket.create_socket.end level=TRACE status=0 
> path="/home/grid-bestgrid/.globus/job/ng1.canterbury.ac.nz/pbs.untagged.sock" 
> ts=2010-03-23T03:31:02.012460Z id=23746 event=gram.startup_socket_init.end 
> level=DEBUG status=0 
> path="/home/grid-bestgrid/.globus/job/ng1.canterbury.ac.nz/pbs.untagged.sock" 
> ts=2010-03-23T03:31:02.014479Z id=23746 event=gram.send_job.start level=INFO 
> http_body_fd=8 context_fd=11 response_fd=1 
> ts=2010-03-23T03:31:02.060958Z id=23746 event=gram.reload_requests.start 
> level=INFO 
> ts=2010-03-23T03:31:02.061247Z id=23746 event=gram.make_job_dir.start 
> level=TRACE gramid=/16073668359893374021/123149967014514085/ 
> ts=2010-03-23T03:31:02.061658Z id=23746 event=gram.make_job_dir.end 
> level=TRACE gramid=/16073668359893374021/123149967014514085/ status=0 
> path=/home/grid-bestgrid/.globus/job/ng1.canterbury.ac.nz/16073668359893374021.123149967014514085
>  
> ts=2010-03-23T03:31:02.061725Z id=23746 event=gram.state_file_read.start 
> level=TRACE gramid=/16073668359893374021/123149967014514085/ 
> path=/opt/globus/tmp/gram_job_state/job.ng1.canterbury.ac.nz.16073668359893374021.123149967014514085
>  
> ts=2010-03-23T03:31:02.061901Z id=23746 event=gram.state_file_read.info 
> level=DEBUG gramid=/16073668359893374021/123149967014514085/ 
> path=/opt/globus/tmp/gram_job_state/job.ng1.canterbury.ac.nz.16073668359893374021.123149967014514085
>  msg="Unable to check status of job lock file" errno=13 reason="Permission 
> denied" 
> ts=2010-03-23T03:31:02.061929Z id=23746 event=gram.state_file_read.end 
> level=ERROR gramid=/16073668359893374021/123149967014514085/ status=-158 
> path=/opt/globus/tmp/gram_job_state/job.ng1.canterbury.ac.nz.16073668359893374021.123149967014514085
>  msg="Error opening job lock file" errno=13 reason="Permission denied" 
> ts=2010-03-23T03:31:02.061947Z id=23746 event=gram.directory_destroy.start 
> level=TRACE gramid=/16073668359893374021/123149967014514085/ 
> path="/home/grid-bestgrid/.globus/job/ng1.canterbury.ac.nz/16073668359893374021.123149967014514085"
>  
> ts=2010-03-23T03:31:02.062526Z id=23746 event=gram.directory_destroy.end 
> level=DEBUG gramid=/16073668359893374021/123149967014514085/ 
> path="/home/grid-bestgrid/.globus/job/ng1.canterbury.ac.nz/16073668359893374021.123149967014514085"
>  failures=0 status=0 
> ts=2010-03-23T03:31:02.062567Z id=23746 event=gram.reload_requests.info 
> level=WARN statedir="/opt/globus/tmp/gram_job_state" msg="Error restarting 
> job" gramid=16073668359893374021/123149967014514085 status=-122 reason="could 
> not read the job state file"
> ts=2010-03-23T03:31:02.062661Z id=23746 event=gram.make_job_dir.start 
> level=TRACE gramid=/16073676060123497651/123149967014514085/ 
> ts=2010-03-23T03:31:02.063030Z id=23746 event=gram.make_job_dir.end 
> level=TRACE gramid=/16073676060123497651/123149967014514085/ status=0 
> path=/home/grid-bestgrid/.globus/job/ng1.canterbury.ac.nz/16073676060123497651.123149967014514085
>  
> ts=2010-03-23T03:31:02.063061Z id=23746 event=gram.state_file_read.start 
> level=TRACE gramid=/16073676060123497651/123149967014514085/ 
> path=/opt/globus/tmp/gram_job_state/job.ng1.canterbury.ac.nz.16073676060123497651.123149967014514085
>  
> ts=2010-03-23T03:31:02.063099Z id=23746 event=gram.state_file_read.info 
> level=DEBUG gramid=/16073676060123497651/123149967014514085/ 
> path=/opt/globus/tmp/gram_job_state/job.ng1.canterbury.ac.nz.16073676060123497651.123149967014514085
>  msg="Unable to check status of job lock file" errno=13 reason="Permission 
> denied" 
> ts=2010-03-23T03:31:02.063121Z id=23746 event=gram.state_file_read.end 
> level=ERROR gramid=/16073676060123497651/123149967014514085/ status=-158 
> path=/opt/globus/tmp/gram_job_state/job.ng1.canterbury.ac.nz.16073676060123497651.123149967014514085
>  msg="Error opening job lock file" errno=13 reason="Permission denied" 
> ts=2010-03-23T03:31:02.063133Z id=23746 event=gram.directory_destroy.start 
> level=TRACE gramid=/16073676060123497651/123149967014514085/ 
> path="/home/grid-bestgrid/.globus/job/ng1.canterbury.ac.nz/16073676060123497651.123149967014514085"
>  
> ts=2010-03-23T03:31:02.063658Z id=23746 event=gram.directory_destroy.end 
> level=DEBUG gramid=/16073676060123497651/123149967014514085/ 
> path="/home/grid-bestgrid/.globus/job/ng1.canterbury.ac.nz/16073676060123497651.123149967014514085"
>  failures=0 status=0 
> ts=2010-03-23T03:31:02.063675Z id=23746 event=gram.reload_requests.info 
> level=WARN statedir="/opt/globus/tmp/gram_job_state" msg="Error restarting 
> job" gramid=16073676060123497651/123149967014514085 status=-122 reason="could 
> not read the job state file"
> ts=2010-03-23T03:31:02.063766Z id=23746 event=gram.make_job_dir.start 
> level=TRACE gramid=/16073674959970007381/123149967014514085/ 
> ts=2010-03-23T03:31:02.064118Z id=23746 event=gram.make_job_dir.end 
> level=TRACE gramid=/16073674959970007381/123149967014514085/ status=0 
> path=/home/grid-bestgrid/.globus/job/ng1.canterbury.ac.nz/16073674959970007381.123149967014514085
>  
> ts=2010-03-23T03:31:02.064148Z id=23746 event=gram.state_file_read.start 
> level=TRACE gramid=/16073674959970007381/123149967014514085/ 
> path=/opt/globus/tmp/gram_job_state/job.ng1.canterbury.ac.nz.16073674959970007381.123149967014514085
>  
> ts=2010-03-23T03:31:02.064184Z id=23746 event=gram.state_file_read.info 
> level=DEBUG gramid=/16073674959970007381/123149967014514085/ 
> path=/opt/globus/tmp/gram_job_state/job.ng1.canterbury.ac.nz.16073674959970007381.123149967014514085
>  msg="Unable to check status of job lock file" errno=13 reason="Permission 
> denied" 
> ts=2010-03-23T03:31:02.064205Z id=23746 event=gram.state_file_read.end 
> level=ERROR gramid=/16073674959970007381/123149967014514085/ status=-158 
> path=/opt/globus/tmp/gram_job_state/job.ng1.canterbury.ac.nz.16073674959970007381.123149967014514085
>  msg="Error opening job lock file" errno=13 reason="Permission denied" 
> ts=2010-03-23T03:31:02.064218Z id=23746 event=gram.directory_destroy.start 
> level=TRACE gramid=/16073674959970007381/123149967014514085/ 
> path="/home/grid-bestgrid/.globus/job/ng1.canterbury.ac.nz/16073674959970007381.123149967014514085"
>  
> ts=2010-03-23T03:31:02.064746Z id=23746 event=gram.directory_destroy.end 
> level=DEBUG gramid=/16073674959970007381/123149967014514085/ 
> path="/home/grid-bestgrid/.globus/job/ng1.canterbury.ac.nz/16073674959970007381.123149967014514085"
>  failures=0 status=0 
> ts=2010-03-23T03:31:02.064763Z id=23746 event=gram.reload_requests.info 
> level=WARN statedir="/opt/globus/tmp/gram_job_state" msg="Error restarting 
> job" gramid=16073674959970007381/123149967014514085 status=-122 reason="could 
> not read the job state file"
> ts=2010-03-23T03:31:02.064784Z id=23746 event=gram.reload_requests.end 
> level=INFO statedir="/opt/globus/tmp/gram_job_state" status=0 requests=0 
> ts=2010-03-23T03:31:02.064801Z id=23746 event=gram.seg.start level=TRACE 
> module=pbs
> ts=2010-03-23T03:31:02.064815Z id=23746 event=gram.seg.activate.start 
> level=TRACE module=pbs
> ts=2010-03-23T03:31:02.065922Z id=23746 event=gram.new_request.start 
> level=DEBUG fd=14 
> ts=2010-03-23T03:31:02.068525Z id=23746 event=gram.import_sec_context.start 
> level=TRACE fd=16
> ts=2010-03-23T03:31:02.070356Z id=23746 event=gram.import_sec_context.end 
> level=TRACE status=0 globusid="/C=NZ/O=BeSTGRID/OU=University of 
> Canterbury/CN=Vladimir Mencl" 
> ts=2010-03-23T03:31:02.070383Z id=23746 event=gram.read_request.start 
> level=TRACE fd=15
> \nrsl: \"&(\\\"rsl_substitution\\\" = (\\\"GLOBUSRUN_GASS_URL\\\" 
> \\\"https://ng1.canterbury.ac.nz:40383\\\"; ) )(\\\"stderr\\\" = 
> $(\\\"GLOBUSRUN_GASS_URL\\\") # \\\"/dev/\n" rr\\\" )(\\\"stdout\\\" = 
> $(\\\"GLOBUSRUN_GASS_URL\\\") # \\\"/dev/stdout\\\" )(\\\"executable\\\" = 
> \\\"/bin/hostname\\\" )(\\\"queue\\\" = \\\"gt5test\\\" )\"
> ts=2010-03-23T03:31:02.070488Z id=23746 event=gram.read_request.end 
> level=TRACE status=0 
> ts=2010-03-23T03:31:02.070628Z id=23746 event=gram.make_job_dir.start 
> level=TRACE gramid=/16073677157215589761/123149967014535089/ 
> ts=2010-03-23T03:31:02.070691Z id=23746 event=gram.send_job.end level=INFO 
> http_body_fd=8 context_fd=11 response_fd=1 status=0 
> ts=2010-03-23T03:31:02.070742Z id=23746 event=gram.end level=DEBUG 
> ts=2010-03-23T03:31:02.070992Z id=23746 event=gram.make_job_dir.end 
> level=TRACE gramid=/16073677157215589761/123149967014535089/ status=0 
> path=/home/grid-bestgrid/.globus/job/ng1.canterbury.ac.nz/16073677157215589761.123149967014535089
>  
> ts=2010-03-23T03:31:02.071335Z id=23746 event=gram.init_scratchdir.start 
> level=DEBUG gramid=/16073677157215589761/123149967014535089/ 
> base="/home/grid-bestgrid" 
> ts=2010-03-23T03:31:02.071348Z id=23746 event=gram.init_scratchdir.end 
> level=DEBUG gramid=/16073677157215589761/123149967014535089/ status=0 
> reason="scratch_dir not in RSL" 
> ts=2010-03-23T03:31:02.071359Z id=23746 event=gram.gass_cache_init.start 
> level=TRACE gramid=/16073677157215589761/123149967014535089/ 
> ts=2010-03-23T03:31:02.071370Z id=23746 event=gram.gass_cache_init.info 
> level=TRACE gramid=/16073677157215589761/123149967014535089/ 
> path=/home/grid-bestgrid/.globus/.gass_cache
> ts=2010-03-23T03:31:02.072663Z id=23746 event=gram.new_request.start 
> level=DEBUG fd=-1 
> ts=2010-03-23T03:31:02.072854Z id=23746 event=gram.gass_cache_init.end 
> level=TRACE gramid=/16073677157215589761/123149967014535089/ status=0 
> path=/home/grid-bestgrid/.globus/.gass_cache
> ts=2010-03-23T03:31:02.072901Z id=23746 event=gram.directory_destroy.start 
> level=TRACE gramid=/16073677157215589761/123149967014535089/ 
> path="/home/grid-bestgrid/.globus/job/ng1.canterbury.ac.nz/16073677157215589761.123149967014535089"
>  
> ts=2010-03-23T03:31:02.072992Z id=23746 event=gram.new_request.end 
> level=TRACE fd=-1 msg="recvmsg failed" status=-10 errno=9 reason="Bad file 
> descriptor" 
> ts=2010-03-23T03:31:02.073461Z id=23746 event=gram.directory_destroy.end 
> level=DEBUG gramid=/16073677157215589761/123149967014535089/ 
> path="/home/grid-bestgrid/.globus/job/ng1.canterbury.ac.nz/16073677157215589761.123149967014535089"
>  failures=0 status=0 
> ts=2010-03-23T03:31:02.073558Z id=23746 event=gram.new_request.info 
> level=DEBUG gramid= msg="the provided RSL 'queue' parameter is invalid" 
> response=37 
> ts=2010-03-23T03:31:02.073577Z id=23746 event=gram.reply.start level=DEBUG 
> gramid= job_contact="" response_code=37 
> ts=2010-03-23T03:31:02.076731Z id=23746 event=gram.reply.end level=DEBUG 
> gramid= status=0 
> ts=2010-03-23T03:31:02.077133Z id=23746 event=gram.new_request.start 
> level=DEBUG fd=-1 
> ts=2010-03-23T03:31:02.077167Z id=23746 event=gram.new_request.end 
> level=TRACE fd=-1 msg="recvmsg failed" status=-10 errno=9 reason="Bad file 
> descriptor" 


-- 
Vladimir Mencl, Ph.D.
E-Research Services and Systems Consultant
BlueFern Supercomputing Services
University of Canterbury
Private Bag 4800
Christchurch 8140
New Zealand

http://www.bluefern.canterbury.ac.nz
mailto:[email protected]
Phone: +64 3 364 3012
Mobile: +64 21 997 352
Fax: +64 3 364 2332

Reply via email to