Hi, I had setup GRAM5 with PBS on a system here - and everything seemed working all fine.
... until I tried specifying a queue name in the RSL. Not specifying it, or specifying the single queue the system was setup with, job submission works all fine: > globusrun -o -r ng1.canterbury.ac.nz > '&(executable=/bin/hostname)(queue=small)' > ngcompute.canterbury.ac.nz But when I pass any other queue name, it fails: > globusrun -o -r ng1 '&(executable=/bin/hostname)(queue=gt5test) > 'GRAM Job submission failed because the provided RSL 'queue' parameter is > invalid (error code 37) The queue does exist and I can submit jobs to that queue as the local user. >From what I could trace, the LRM interface script pbs.pm does NOT get invoked at all - somehow, the job manager decides the queue name specified is invalid. I'm attaching below the output I got in ~/gram_<date>.log. I'm running the GT 5.0.0 release, on Linux CentOS 5.4 x86_64. Any help would be highly appreciated. Cheers, Vladimir > ts=2010-03-23T03:31:02.009238Z id=23746 > event=gram.register_proxy_timeout.start level=TRACE > ts=2010-03-23T03:31:02.009735Z id=23746 event=gram.register_proxy_timeout.end > level=TRACE status=0 lifetime=38955 timeout=600 > ts=2010-03-23T03:31:02.009778Z id=23746 event=gram.startup_socket_init.start > level=DEBUG > ts=2010-03-23T03:31:02.009790Z id=23746 > event=gram.startup_socket_init.lock.start level=TRACE > path="/home/grid-bestgrid/.globus/job/ng1.canterbury.ac.nz/pbs.untagged.lock" > ts=2010-03-23T03:31:02.011370Z id=23746 > event=gram.startup_socket_init.lock.end level=TRACE > path="/home/grid-bestgrid/.globus/job/ng1.canterbury.ac.nz/pbs.untagged.lock" > status=0 > ts=2010-03-23T03:31:02.011397Z id=23746 > event=gram.startup_socket_init.write_pid.start level=TRACE > path="/home/grid-bestgrid/.globus/job/ng1.canterbury.ac.nz/pbs.untagged.pid" > ts=2010-03-23T03:31:02.011976Z id=23746 > event=gram.startup_socket_init.write_pid.end level=TRACE > path="/home/grid-bestgrid/.globus/job/ng1.canterbury.ac.nz/pbs.untagged.pid" > status=0 > ts=2010-03-23T03:31:02.011989Z id=23746 > event=gram.startup_socket_init.create_socket.start level=TRACE > path="/home/grid-bestgrid/.globus/job/ng1.canterbury.ac.nz/pbs.untagged.sock" > ts=2010-03-23T03:31:02.012446Z id=23746 > event=gram.startup.socket.create_socket.end level=TRACE status=0 > path="/home/grid-bestgrid/.globus/job/ng1.canterbury.ac.nz/pbs.untagged.sock" > ts=2010-03-23T03:31:02.012460Z id=23746 event=gram.startup_socket_init.end > level=DEBUG status=0 > path="/home/grid-bestgrid/.globus/job/ng1.canterbury.ac.nz/pbs.untagged.sock" > ts=2010-03-23T03:31:02.014479Z id=23746 event=gram.send_job.start level=INFO > http_body_fd=8 context_fd=11 response_fd=1 > ts=2010-03-23T03:31:02.060958Z id=23746 event=gram.reload_requests.start > level=INFO > ts=2010-03-23T03:31:02.061247Z id=23746 event=gram.make_job_dir.start > level=TRACE gramid=/16073668359893374021/123149967014514085/ > ts=2010-03-23T03:31:02.061658Z id=23746 event=gram.make_job_dir.end > level=TRACE gramid=/16073668359893374021/123149967014514085/ status=0 > path=/home/grid-bestgrid/.globus/job/ng1.canterbury.ac.nz/16073668359893374021.123149967014514085 > > ts=2010-03-23T03:31:02.061725Z id=23746 event=gram.state_file_read.start > level=TRACE gramid=/16073668359893374021/123149967014514085/ > path=/opt/globus/tmp/gram_job_state/job.ng1.canterbury.ac.nz.16073668359893374021.123149967014514085 > > ts=2010-03-23T03:31:02.061901Z id=23746 event=gram.state_file_read.info > level=DEBUG gramid=/16073668359893374021/123149967014514085/ > path=/opt/globus/tmp/gram_job_state/job.ng1.canterbury.ac.nz.16073668359893374021.123149967014514085 > msg="Unable to check status of job lock file" errno=13 reason="Permission > denied" > ts=2010-03-23T03:31:02.061929Z id=23746 event=gram.state_file_read.end > level=ERROR gramid=/16073668359893374021/123149967014514085/ status=-158 > path=/opt/globus/tmp/gram_job_state/job.ng1.canterbury.ac.nz.16073668359893374021.123149967014514085 > msg="Error opening job lock file" errno=13 reason="Permission denied" > ts=2010-03-23T03:31:02.061947Z id=23746 event=gram.directory_destroy.start > level=TRACE gramid=/16073668359893374021/123149967014514085/ > path="/home/grid-bestgrid/.globus/job/ng1.canterbury.ac.nz/16073668359893374021.123149967014514085" > > ts=2010-03-23T03:31:02.062526Z id=23746 event=gram.directory_destroy.end > level=DEBUG gramid=/16073668359893374021/123149967014514085/ > path="/home/grid-bestgrid/.globus/job/ng1.canterbury.ac.nz/16073668359893374021.123149967014514085" > failures=0 status=0 > ts=2010-03-23T03:31:02.062567Z id=23746 event=gram.reload_requests.info > level=WARN statedir="/opt/globus/tmp/gram_job_state" msg="Error restarting > job" gramid=16073668359893374021/123149967014514085 status=-122 reason="could > not read the job state file" > ts=2010-03-23T03:31:02.062661Z id=23746 event=gram.make_job_dir.start > level=TRACE gramid=/16073676060123497651/123149967014514085/ > ts=2010-03-23T03:31:02.063030Z id=23746 event=gram.make_job_dir.end > level=TRACE gramid=/16073676060123497651/123149967014514085/ status=0 > path=/home/grid-bestgrid/.globus/job/ng1.canterbury.ac.nz/16073676060123497651.123149967014514085 > > ts=2010-03-23T03:31:02.063061Z id=23746 event=gram.state_file_read.start > level=TRACE gramid=/16073676060123497651/123149967014514085/ > path=/opt/globus/tmp/gram_job_state/job.ng1.canterbury.ac.nz.16073676060123497651.123149967014514085 > > ts=2010-03-23T03:31:02.063099Z id=23746 event=gram.state_file_read.info > level=DEBUG gramid=/16073676060123497651/123149967014514085/ > path=/opt/globus/tmp/gram_job_state/job.ng1.canterbury.ac.nz.16073676060123497651.123149967014514085 > msg="Unable to check status of job lock file" errno=13 reason="Permission > denied" > ts=2010-03-23T03:31:02.063121Z id=23746 event=gram.state_file_read.end > level=ERROR gramid=/16073676060123497651/123149967014514085/ status=-158 > path=/opt/globus/tmp/gram_job_state/job.ng1.canterbury.ac.nz.16073676060123497651.123149967014514085 > msg="Error opening job lock file" errno=13 reason="Permission denied" > ts=2010-03-23T03:31:02.063133Z id=23746 event=gram.directory_destroy.start > level=TRACE gramid=/16073676060123497651/123149967014514085/ > path="/home/grid-bestgrid/.globus/job/ng1.canterbury.ac.nz/16073676060123497651.123149967014514085" > > ts=2010-03-23T03:31:02.063658Z id=23746 event=gram.directory_destroy.end > level=DEBUG gramid=/16073676060123497651/123149967014514085/ > path="/home/grid-bestgrid/.globus/job/ng1.canterbury.ac.nz/16073676060123497651.123149967014514085" > failures=0 status=0 > ts=2010-03-23T03:31:02.063675Z id=23746 event=gram.reload_requests.info > level=WARN statedir="/opt/globus/tmp/gram_job_state" msg="Error restarting > job" gramid=16073676060123497651/123149967014514085 status=-122 reason="could > not read the job state file" > ts=2010-03-23T03:31:02.063766Z id=23746 event=gram.make_job_dir.start > level=TRACE gramid=/16073674959970007381/123149967014514085/ > ts=2010-03-23T03:31:02.064118Z id=23746 event=gram.make_job_dir.end > level=TRACE gramid=/16073674959970007381/123149967014514085/ status=0 > path=/home/grid-bestgrid/.globus/job/ng1.canterbury.ac.nz/16073674959970007381.123149967014514085 > > ts=2010-03-23T03:31:02.064148Z id=23746 event=gram.state_file_read.start > level=TRACE gramid=/16073674959970007381/123149967014514085/ > path=/opt/globus/tmp/gram_job_state/job.ng1.canterbury.ac.nz.16073674959970007381.123149967014514085 > > ts=2010-03-23T03:31:02.064184Z id=23746 event=gram.state_file_read.info > level=DEBUG gramid=/16073674959970007381/123149967014514085/ > path=/opt/globus/tmp/gram_job_state/job.ng1.canterbury.ac.nz.16073674959970007381.123149967014514085 > msg="Unable to check status of job lock file" errno=13 reason="Permission > denied" > ts=2010-03-23T03:31:02.064205Z id=23746 event=gram.state_file_read.end > level=ERROR gramid=/16073674959970007381/123149967014514085/ status=-158 > path=/opt/globus/tmp/gram_job_state/job.ng1.canterbury.ac.nz.16073674959970007381.123149967014514085 > msg="Error opening job lock file" errno=13 reason="Permission denied" > ts=2010-03-23T03:31:02.064218Z id=23746 event=gram.directory_destroy.start > level=TRACE gramid=/16073674959970007381/123149967014514085/ > path="/home/grid-bestgrid/.globus/job/ng1.canterbury.ac.nz/16073674959970007381.123149967014514085" > > ts=2010-03-23T03:31:02.064746Z id=23746 event=gram.directory_destroy.end > level=DEBUG gramid=/16073674959970007381/123149967014514085/ > path="/home/grid-bestgrid/.globus/job/ng1.canterbury.ac.nz/16073674959970007381.123149967014514085" > failures=0 status=0 > ts=2010-03-23T03:31:02.064763Z id=23746 event=gram.reload_requests.info > level=WARN statedir="/opt/globus/tmp/gram_job_state" msg="Error restarting > job" gramid=16073674959970007381/123149967014514085 status=-122 reason="could > not read the job state file" > ts=2010-03-23T03:31:02.064784Z id=23746 event=gram.reload_requests.end > level=INFO statedir="/opt/globus/tmp/gram_job_state" status=0 requests=0 > ts=2010-03-23T03:31:02.064801Z id=23746 event=gram.seg.start level=TRACE > module=pbs > ts=2010-03-23T03:31:02.064815Z id=23746 event=gram.seg.activate.start > level=TRACE module=pbs > ts=2010-03-23T03:31:02.065922Z id=23746 event=gram.new_request.start > level=DEBUG fd=14 > ts=2010-03-23T03:31:02.068525Z id=23746 event=gram.import_sec_context.start > level=TRACE fd=16 > ts=2010-03-23T03:31:02.070356Z id=23746 event=gram.import_sec_context.end > level=TRACE status=0 globusid="/C=NZ/O=BeSTGRID/OU=University of > Canterbury/CN=Vladimir Mencl" > ts=2010-03-23T03:31:02.070383Z id=23746 event=gram.read_request.start > level=TRACE fd=15 > \nrsl: \"&(\\\"rsl_substitution\\\" = (\\\"GLOBUSRUN_GASS_URL\\\" > \\\"https://ng1.canterbury.ac.nz:40383\\\" ) )(\\\"stderr\\\" = > $(\\\"GLOBUSRUN_GASS_URL\\\") # \\\"/dev/\n" rr\\\" )(\\\"stdout\\\" = > $(\\\"GLOBUSRUN_GASS_URL\\\") # \\\"/dev/stdout\\\" )(\\\"executable\\\" = > \\\"/bin/hostname\\\" )(\\\"queue\\\" = \\\"gt5test\\\" )\" > ts=2010-03-23T03:31:02.070488Z id=23746 event=gram.read_request.end > level=TRACE status=0 > ts=2010-03-23T03:31:02.070628Z id=23746 event=gram.make_job_dir.start > level=TRACE gramid=/16073677157215589761/123149967014535089/ > ts=2010-03-23T03:31:02.070691Z id=23746 event=gram.send_job.end level=INFO > http_body_fd=8 context_fd=11 response_fd=1 status=0 > ts=2010-03-23T03:31:02.070742Z id=23746 event=gram.end level=DEBUG > ts=2010-03-23T03:31:02.070992Z id=23746 event=gram.make_job_dir.end > level=TRACE gramid=/16073677157215589761/123149967014535089/ status=0 > path=/home/grid-bestgrid/.globus/job/ng1.canterbury.ac.nz/16073677157215589761.123149967014535089 > > ts=2010-03-23T03:31:02.071335Z id=23746 event=gram.init_scratchdir.start > level=DEBUG gramid=/16073677157215589761/123149967014535089/ > base="/home/grid-bestgrid" > ts=2010-03-23T03:31:02.071348Z id=23746 event=gram.init_scratchdir.end > level=DEBUG gramid=/16073677157215589761/123149967014535089/ status=0 > reason="scratch_dir not in RSL" > ts=2010-03-23T03:31:02.071359Z id=23746 event=gram.gass_cache_init.start > level=TRACE gramid=/16073677157215589761/123149967014535089/ > ts=2010-03-23T03:31:02.071370Z id=23746 event=gram.gass_cache_init.info > level=TRACE gramid=/16073677157215589761/123149967014535089/ > path=/home/grid-bestgrid/.globus/.gass_cache > ts=2010-03-23T03:31:02.072663Z id=23746 event=gram.new_request.start > level=DEBUG fd=-1 > ts=2010-03-23T03:31:02.072854Z id=23746 event=gram.gass_cache_init.end > level=TRACE gramid=/16073677157215589761/123149967014535089/ status=0 > path=/home/grid-bestgrid/.globus/.gass_cache > ts=2010-03-23T03:31:02.072901Z id=23746 event=gram.directory_destroy.start > level=TRACE gramid=/16073677157215589761/123149967014535089/ > path="/home/grid-bestgrid/.globus/job/ng1.canterbury.ac.nz/16073677157215589761.123149967014535089" > > ts=2010-03-23T03:31:02.072992Z id=23746 event=gram.new_request.end > level=TRACE fd=-1 msg="recvmsg failed" status=-10 errno=9 reason="Bad file > descriptor" > ts=2010-03-23T03:31:02.073461Z id=23746 event=gram.directory_destroy.end > level=DEBUG gramid=/16073677157215589761/123149967014535089/ > path="/home/grid-bestgrid/.globus/job/ng1.canterbury.ac.nz/16073677157215589761.123149967014535089" > failures=0 status=0 > ts=2010-03-23T03:31:02.073558Z id=23746 event=gram.new_request.info > level=DEBUG gramid= msg="the provided RSL 'queue' parameter is invalid" > response=37 > ts=2010-03-23T03:31:02.073577Z id=23746 event=gram.reply.start level=DEBUG > gramid= job_contact="" response_code=37 > ts=2010-03-23T03:31:02.076731Z id=23746 event=gram.reply.end level=DEBUG > gramid= status=0 > ts=2010-03-23T03:31:02.077133Z id=23746 event=gram.new_request.start > level=DEBUG fd=-1 > ts=2010-03-23T03:31:02.077167Z id=23746 event=gram.new_request.end > level=TRACE fd=-1 msg="recvmsg failed" status=-10 errno=9 reason="Bad file > descriptor" -- Vladimir Mencl, Ph.D. E-Research Services and Systems Consultant BlueFern Supercomputing Services University of Canterbury Private Bag 4800 Christchurch 8140 New Zealand http://www.bluefern.canterbury.ac.nz mailto:[email protected] Phone: +64 3 364 3012 Mobile: +64 21 997 352 Fax: +64 3 364 2332
